RFR: 8007806: Need a Throwables performance counter

Peter Levart peter.levart at gmail.com
Sun Feb 24 21:25:40 UTC 2013


On 02/24/2013 09:57 PM, David Holmes wrote:
> On 25/02/2013 6:18 AM, Peter Levart wrote:
>> Hi Alan, David, Nils,
>>
>> I just want to clear something regarding PerfCounter implementation.
>>
>> Access to 64bit value in native memory is through a direct buffer which
>> uses normal read/write (non-volatile, Unsafe.[get|set]Long). So on
>> processors that don't support atomic 64bit stores/loads, each access
>> results in two separate 32bit load/store accesses right?
>
> Unsafe.get|setLong uses locking on those platforms.

Even if it does, it is important whether "all" accesses to this 64bit 
value are using locking and whether they are using the same lock. Aren't 
performance counters JVM native variables where just some of them happen 
to be updated from Java?

>
>> The PerfCounter methods that access the 64bit value are synchronized,
>> using PerfCounter instance as a lock. But how is this 64bit value
>> accessed for example in the jstat utility? Is it possible that jstat can
>> see one half of the 64bit value before the update and the other half
>> after the update?
>
> Does jstat access these values directly or only via the synchronized 
> methods? If the latter then the value can't be "torn" that way. The 
> sync method will store the value in 2 32-bit registers, and the 
> variable load in jstat will take two instructions, but nothing can 
> touch those registers.

I'm not saying that the value could be corrupted in any way, just that 
the unsynchronized observer (like jstat) could see it "torn" sometimes.

Regards, Peter

>
> David
> -----
>
>> If this is true and it is not that important, then instead of a
>> synchronized update of 64bit counter, a 32bit CAS could be used,
>> optionally (rarely) followed by a second 32bit CAS, like for example:
>>
>> http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html 
>>
>>
>> I tried this on ARM v6 and it works much better than synchronized
>> access, but I don't know if it's acceptable. It guarantees eventual
>> correctness of summed value if the only operation performed is add() (no
>> set() intermingled) and has the same possibility of incorrect half-half
>> reads by observers as current PerfCounter has for unsynchronized 
>> observers.
>>
>> Here's the comparison of unpatched/patched PerfCounter.increment()
>> micro-benchmark on single-core ARM v6 (Raspbery-PI):
>>
>> *** Original PerfCounter, ARM v6
>>
>> #
>> # PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
>> #
>>             1 threads, Tavg =    269.34 ns/op (σ =   0.00 ns/op) [   
>> 269.34]
>>             2 threads, Tavg =  7,170.48 ns/op (σ = 410.77 ns/op) [
>> 6,783.73,  7,603.95]
>>             3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op)
>> [11,792.33, 11,714.67, 12,639.26]
>>             4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op)
>> [15,592.04, 18,511.52, 15,642.52, 14,818.16]
>>
>>
>> *** Patched PerfCounter, ARM v6
>>
>> #
>> # PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
>> #
>>             1 threads, Tavg =    166.21 ns/op (σ =   0.00 ns/op) [   
>> 166.21]
>>             2 threads, Tavg =    332.58 ns/op (σ =   0.12 ns/op) [
>> 332.45,    332.70]
>>             3 threads, Tavg =    500.30 ns/op (σ =   0.22 ns/op) [
>> 500.04,    500.29,    500.58]
>>             4 threads, Tavg =    667.95 ns/op (σ =   2.11 ns/op) [
>> 665.22,    667.18,    668.40,    671.04]
>>
>>
>> Regards, Peter
>>
>>
>> On 02/24/2013 11:31 AM, David Holmes wrote:
>>> On 24/02/2013 6:50 PM, Peter Levart wrote:
>>>> Hi David,
>>>>
>>>> I thought it was ok to pass null, but I don't know the "portability"
>>>> issues in-depth. The javadoc for Unsafe says:
>>>>
>>>> /"This method refers to a variable by means of two parameters, and 
>>>> so it
>>>> provides (in effect) a double-register addressing mode for Java
>>>> variables. When the object reference is null, this method uses its
>>>> offset as an absolute address. This is similar in operation to methods
>>>> such as getInt(long), which provide (in effect) a single-register
>>>> addressing mode for non-Java variables. However, because Java 
>>>> variables
>>>> may have a different layout in memory from non-Java variables,
>>>> programmers should not assume that these two addressing modes are ever
>>>> equivalent. Also, programmers should remember that offsets from the
>>>> double-register addressing mode cannot be portably confused with longs
>>>> used in the single-register addressing mode."/
>>>
>>> That is the doc for getXXX but not for getAndAddXXX or
>>> compareAndSwapXXX. You can't have null here:
>>>
>>> UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject
>>> unsafe, jobject obj, jlong offset, jlong e, jlong x))
>>>   UnsafeWrapper("Unsafe_CompareAndSwapLong");
>>>   Handle p (THREAD, JNIHandles::resolve(obj));
>>>   jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(), 
>>> offset));
>>>   if (VM_Version::supports_cx8())
>>>     return (jlong)(Atomic::cmpxchg(x, addr, e)) == e;
>>>   else {
>>>     jboolean success = false;
>>>     ObjectLocker ol(p, THREAD);
>>>     if (*addr == e) { *addr = x; success = true; }
>>>     return success;
>>>   }
>>> UNSAFE_END
>>>
>>> David
>>> -----
>>>
>>>
>>>> Does anybody know the in-depth interpretation of the above? Is it only
>>>> the particular Java/native type differences (for example, endianess of
>>>> variables) that these two addressing modes might interpret differently
>>>> or something else too?
>>>>
>>>> Regards, Peter
>>>>
>>>>
>>>> On 02/24/2013 12:39 AM, David Holmes wrote:
>>>>> Peter,
>>>>>
>>>>> In your use of Unsafe you pass "null" as the object. I'm pretty
>>>>> certain you can't pass null here. Unsafe operates on fields or array
>>>>> elements.
>>>>>
>>>>> David
>>>>>
>>>>> On 24/02/2013 5:39 AM, Peter Levart wrote:
>>>>>> Hi Nils,
>>>>>>
>>>>>> If the counters are updated frequently from multiple threads, there
>>>>>> might be contention/scalability issues. Instead of 
>>>>>> synchronization on
>>>>>> updates, you might consider using atomic updates provided by
>>>>>> sun.misc.Unsafe, like for example:
>>>>>>
>>>>>>
>>>>>> Index: jdk/src/share/classes/sun/misc/PerfCounter.java
>>>>>> ===================================================================
>>>>>> --- jdk/src/share/classes/sun/misc/PerfCounter.java
>>>>>> +++ jdk/src/share/classes/sun/misc/PerfCounter.java
>>>>>> @@ -25,6 +25,8 @@
>>>>>>
>>>>>>   package sun.misc;
>>>>>>
>>>>>> +import sun.nio.ch.DirectBuffer;
>>>>>> +
>>>>>>   import java.nio.ByteBuffer;
>>>>>>   import java.nio.ByteOrder;
>>>>>>   import java.nio.LongBuffer;
>>>>>> @@ -50,6 +52,8 @@
>>>>>>   public class PerfCounter {
>>>>>>       private static final Perf perf =
>>>>>>           AccessController.doPrivileged(new Perf.GetPerfAction());
>>>>>> +    private static final Unsafe unsafe =
>>>>>> +        Unsafe.getUnsafe();
>>>>>>
>>>>>>       // Must match values defined in
>>>>>> hotspot/src/share/vm/runtime/perfdata.hpp
>>>>>>       private final static int V_Constant  = 1;
>>>>>> @@ -59,12 +63,14 @@
>>>>>>
>>>>>>       private final String name;
>>>>>>       private final LongBuffer lb;
>>>>>> +    private final DirectBuffer db;
>>>>>>
>>>>>>       private PerfCounter(String name, int type) {
>>>>>>           this.name = name;
>>>>>>           ByteBuffer bb = perf.createLong(name, U_None, type, 0L);
>>>>>>           bb.order(ByteOrder.nativeOrder());
>>>>>>           this.lb = bb.asLongBuffer();
>>>>>> +        this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb :
>>>>>> null;
>>>>>>       }
>>>>>>
>>>>>>       static PerfCounter newPerfCounter(String name) {
>>>>>> @@ -79,23 +85,44 @@
>>>>>>       /**
>>>>>>        * Returns the current value of the perf counter.
>>>>>>        */
>>>>>> -    public synchronized long get() {
>>>>>> +    public long get() {
>>>>>> +        if (db != null) {
>>>>>> +            return unsafe.getLongVolatile(null, db.address());
>>>>>> +        }
>>>>>> +        else {
>>>>>> +            synchronized (this) {
>>>>>> -        return lb.get(0);
>>>>>> -    }
>>>>>> +                return lb.get(0);
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>>
>>>>>>       /**
>>>>>>        * Sets the value of the perf counter to the given newValue.
>>>>>>        */
>>>>>> -    public synchronized void set(long newValue) {
>>>>>> +    public void set(long newValue) {
>>>>>> +        if (db != null) {
>>>>>> +            unsafe.putOrderedLong(null, db.address(), newValue);
>>>>>> +        }
>>>>>> +        else {
>>>>>> +            synchronized (this) {
>>>>>> -        lb.put(0, newValue);
>>>>>> -    }
>>>>>> +                lb.put(0, newValue);
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>>
>>>>>>       /**
>>>>>>        * Adds the given value to the perf counter.
>>>>>>        */
>>>>>> -    public synchronized void add(long value) {
>>>>>> -        long res = get() + value;
>>>>>> +    public void add(long value) {
>>>>>> +        if (db != null) {
>>>>>> +            unsafe.getAndAddLong(null, db.address(), value);
>>>>>> +        }
>>>>>> +        else {
>>>>>> +            synchronized (this) {
>>>>>> +                long res = lb.get(0) + value;
>>>>>> -        lb.put(0, res);
>>>>>> +                lb.put(0, res);
>>>>>> +            }
>>>>>> +        }
>>>>>>       }
>>>>>>
>>>>>>       /**
>>>>>>
>>>>>>
>>>>>>
>>>>>> Testing the PerfCounter.increment() method in a loop on multiple
>>>>>> threads
>>>>>> sharing the same PerfCounter instance, for example, on a 4-core
>>>>>> Intel i7
>>>>>> machine produces the following results:
>>>>>>
>>>>>> #
>>>>>> # PerfCounter_increment: run duration:  5,000 ms, #of logical 
>>>>>> CPUS: 8
>>>>>> #
>>>>>>             1 threads, Tavg =     19.02 ns/op (? = 0.00 ns/op)
>>>>>>             2 threads, Tavg =    109.93 ns/op (? = 6.17 ns/op)
>>>>>>             3 threads, Tavg =    136.64 ns/op (? = 2.99 ns/op)
>>>>>>             4 threads, Tavg =    293.26 ns/op (? = 5.30 ns/op)
>>>>>>             5 threads, Tavg =    316.94 ns/op (? = 6.28 ns/op)
>>>>>>             6 threads, Tavg =    686.96 ns/op (? = 7.09 ns/op)
>>>>>>             7 threads, Tavg =    793.28 ns/op (? = 10.57 ns/op)
>>>>>>             8 threads, Tavg =    898.15 ns/op (? = 14.63 ns/op)
>>>>>>
>>>>>>
>>>>>> With the presented patch, the results are a little better:
>>>>>>
>>>>>> #
>>>>>> # PerfCounter_increment: run duration:  5,000 ms, #of logical 
>>>>>> CPUS: 8
>>>>>> #
>>>>>> # Measure:
>>>>>>             1 threads, Tavg =      5.22 ns/op (? = 0.00 ns/op)
>>>>>>             2 threads, Tavg =     34.51 ns/op (? = 0.60 ns/op)
>>>>>>             3 threads, Tavg =     54.85 ns/op (? = 1.42 ns/op)
>>>>>>             4 threads, Tavg =     74.67 ns/op (? = 1.71 ns/op)
>>>>>>             5 threads, Tavg =     94.71 ns/op (? = 41.68 ns/op)
>>>>>>             6 threads, Tavg =    114.80 ns/op (? = 32.10 ns/op)
>>>>>>             7 threads, Tavg =    136.70 ns/op (? = 26.80 ns/op)
>>>>>>             8 threads, Tavg =    158.48 ns/op (? = 9.93 ns/op)
>>>>>>
>>>>>>
>>>>>> The scalability is not much better, but the raw speed is, so it 
>>>>>> might
>>>>>> present less contention when used in real-world code. If you wanted
>>>>>> even
>>>>>> better scalability, there is a new class in JDK8, the
>>>>>> java.util.concurrent.LongAdder. But that doesn't buy atomic 
>>>>>> "set()" -
>>>>>> only "add()". And it can't update native-memory variables, so it 
>>>>>> could
>>>>>> only be used for add-only counters and in conjunction with a
>>>>>> background
>>>>>> thread that would periodically flush the sum to the native 
>>>>>> memory....
>>>>>>
>>>>>> Regards, Peter
>>>>>>
>>>>>>
>>>>>> On 02/08/2013 06:10 PM, Nils Loodin wrote:
>>>>>>> It would be interesting to know the number of thrown throwables in
>>>>>>> the
>>>>>>> JVM, to be able to do some high level application diagnostics /
>>>>>>> statistics. A good way to put this number would be a performance
>>>>>>> counter, since it is accessible both from Java and from the VM.
>>>>>>>
>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806
>>>>>>> http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nils Loodin
>>>>>>
>>>>
>>




More information about the core-libs-dev mailing list