RFR: 8007806: Need a Throwables performance counter
David Holmes
david.holmes at oracle.com
Sun Feb 24 22:18:14 UTC 2013
We've not-so-slightly hijacked Nils' thread here - apologies for that.
On 25/02/2013 8:05 AM, Peter Levart wrote:
>
> Just looked at one way jstat accesses the counters. It runs in a
> separate VM and maps-in a file that is already mapped in the observing
> VM in the direct buffer. It then accesses it via a LongBuffer view (for
> long counters). So there's no synchronization between counter updater
> and counter reader. On ARM v6 jstat could see a "torn" long counter
> then, right?
Right. With current implementation of PerfLongCounter it uses simple
stores (not atomic ops).
> The double-32bit-CAS updater that I presented would not make it worse
> then on such platforms, I suppose.
No change in tearing abaility.
> On the platforms that support 64bit atomic stores, there are not such
> problems. And I assume those same platforms also support 64bit CAS, or
> are there platforms with 64bit atomic stores and no 64bit CAS?
Most of them actually :) All Java platforms must support atomic
load/store of 64-bit values to support volatile long and double
variables. On 32-bit platforms this is done via a range of techniques -
for example on x86 it is done via the FPU. But these atomic accesses are
currently restricted to Java volatile field accesses via bytecode -
there are not exposed via the Unsafe methods, nor are they made
available via the Atomic:: class in the VM.
Some of these 32-bit platforms also support the 64-bit CAS, which is
what supports_cx8() is intended to indicate.
If the PerfCounters were supposed to be thread-safe then they might use
these alternate atomic access operations.
David
> Regards, Peter
>
>>
>> David
>>
>>> Regards, Peter
>>>
>>>>
>>>> David
>>>> -----
>>>>
>>>>> If this is true and it is not that important, then instead of a
>>>>> synchronized update of 64bit counter, a 32bit CAS could be used,
>>>>> optionally (rarely) followed by a second 32bit CAS, like for example:
>>>>>
>>>>> http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html
>>>>>
>>>>>
>>>>>
>>>>> I tried this on ARM v6 and it works much better than synchronized
>>>>> access, but I don't know if it's acceptable. It guarantees eventual
>>>>> correctness of summed value if the only operation performed is
>>>>> add() (no
>>>>> set() intermingled) and has the same possibility of incorrect
>>>>> half-half
>>>>> reads by observers as current PerfCounter has for unsynchronized
>>>>> observers.
>>>>>
>>>>> Here's the comparison of unpatched/patched PerfCounter.increment()
>>>>> micro-benchmark on single-core ARM v6 (Raspbery-PI):
>>>>>
>>>>> *** Original PerfCounter, ARM v6
>>>>>
>>>>> #
>>>>> # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1
>>>>> #
>>>>> 1 threads, Tavg = 269.34 ns/op (σ = 0.00 ns/op) [
>>>>> 269.34]
>>>>> 2 threads, Tavg = 7,170.48 ns/op (σ = 410.77 ns/op) [
>>>>> 6,783.73, 7,603.95]
>>>>> 3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op)
>>>>> [11,792.33, 11,714.67, 12,639.26]
>>>>> 4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op)
>>>>> [15,592.04, 18,511.52, 15,642.52, 14,818.16]
>>>>>
>>>>>
>>>>> *** Patched PerfCounter, ARM v6
>>>>>
>>>>> #
>>>>> # PerfCounter_increment: run duration: 5,000 ms, #of logical CPUS: 1
>>>>> #
>>>>> 1 threads, Tavg = 166.21 ns/op (σ = 0.00 ns/op) [
>>>>> 166.21]
>>>>> 2 threads, Tavg = 332.58 ns/op (σ = 0.12 ns/op) [
>>>>> 332.45, 332.70]
>>>>> 3 threads, Tavg = 500.30 ns/op (σ = 0.22 ns/op) [
>>>>> 500.04, 500.29, 500.58]
>>>>> 4 threads, Tavg = 667.95 ns/op (σ = 2.11 ns/op) [
>>>>> 665.22, 667.18, 668.40, 671.04]
>>>>>
>>>>>
>>>>> Regards, Peter
>>>>>
>>>>>
>>>>> On 02/24/2013 11:31 AM, David Holmes wrote:
>>>>>> On 24/02/2013 6:50 PM, Peter Levart wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> I thought it was ok to pass null, but I don't know the "portability"
>>>>>>> issues in-depth. The javadoc for Unsafe says:
>>>>>>>
>>>>>>> /"This method refers to a variable by means of two parameters, and
>>>>>>> so it
>>>>>>> provides (in effect) a double-register addressing mode for Java
>>>>>>> variables. When the object reference is null, this method uses its
>>>>>>> offset as an absolute address. This is similar in operation to
>>>>>>> methods
>>>>>>> such as getInt(long), which provide (in effect) a single-register
>>>>>>> addressing mode for non-Java variables. However, because Java
>>>>>>> variables
>>>>>>> may have a different layout in memory from non-Java variables,
>>>>>>> programmers should not assume that these two addressing modes are
>>>>>>> ever
>>>>>>> equivalent. Also, programmers should remember that offsets from the
>>>>>>> double-register addressing mode cannot be portably confused with
>>>>>>> longs
>>>>>>> used in the single-register addressing mode."/
>>>>>>
>>>>>> That is the doc for getXXX but not for getAndAddXXX or
>>>>>> compareAndSwapXXX. You can't have null here:
>>>>>>
>>>>>> UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject
>>>>>> unsafe, jobject obj, jlong offset, jlong e, jlong x))
>>>>>> UnsafeWrapper("Unsafe_CompareAndSwapLong");
>>>>>> Handle p (THREAD, JNIHandles::resolve(obj));
>>>>>> jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(),
>>>>>> offset));
>>>>>> if (VM_Version::supports_cx8())
>>>>>> return (jlong)(Atomic::cmpxchg(x, addr, e)) == e;
>>>>>> else {
>>>>>> jboolean success = false;
>>>>>> ObjectLocker ol(p, THREAD);
>>>>>> if (*addr == e) { *addr = x; success = true; }
>>>>>> return success;
>>>>>> }
>>>>>> UNSAFE_END
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>
>>>>>>> Does anybody know the in-depth interpretation of the above? Is it
>>>>>>> only
>>>>>>> the particular Java/native type differences (for example,
>>>>>>> endianess of
>>>>>>> variables) that these two addressing modes might interpret
>>>>>>> differently
>>>>>>> or something else too?
>>>>>>>
>>>>>>> Regards, Peter
>>>>>>>
>>>>>>>
>>>>>>> On 02/24/2013 12:39 AM, David Holmes wrote:
>>>>>>>> Peter,
>>>>>>>>
>>>>>>>> In your use of Unsafe you pass "null" as the object. I'm pretty
>>>>>>>> certain you can't pass null here. Unsafe operates on fields or
>>>>>>>> array
>>>>>>>> elements.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> On 24/02/2013 5:39 AM, Peter Levart wrote:
>>>>>>>>> Hi Nils,
>>>>>>>>>
>>>>>>>>> If the counters are updated frequently from multiple threads,
>>>>>>>>> there
>>>>>>>>> might be contention/scalability issues. Instead of
>>>>>>>>> synchronization on
>>>>>>>>> updates, you might consider using atomic updates provided by
>>>>>>>>> sun.misc.Unsafe, like for example:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Index: jdk/src/share/classes/sun/misc/PerfCounter.java
>>>>>>>>> ===================================================================
>>>>>>>>>
>>>>>>>>> --- jdk/src/share/classes/sun/misc/PerfCounter.java
>>>>>>>>> +++ jdk/src/share/classes/sun/misc/PerfCounter.java
>>>>>>>>> @@ -25,6 +25,8 @@
>>>>>>>>>
>>>>>>>>> package sun.misc;
>>>>>>>>>
>>>>>>>>> +import sun.nio.ch.DirectBuffer;
>>>>>>>>> +
>>>>>>>>> import java.nio.ByteBuffer;
>>>>>>>>> import java.nio.ByteOrder;
>>>>>>>>> import java.nio.LongBuffer;
>>>>>>>>> @@ -50,6 +52,8 @@
>>>>>>>>> public class PerfCounter {
>>>>>>>>> private static final Perf perf =
>>>>>>>>> AccessController.doPrivileged(new Perf.GetPerfAction());
>>>>>>>>> + private static final Unsafe unsafe =
>>>>>>>>> + Unsafe.getUnsafe();
>>>>>>>>>
>>>>>>>>> // Must match values defined in
>>>>>>>>> hotspot/src/share/vm/runtime/perfdata.hpp
>>>>>>>>> private final static int V_Constant = 1;
>>>>>>>>> @@ -59,12 +63,14 @@
>>>>>>>>>
>>>>>>>>> private final String name;
>>>>>>>>> private final LongBuffer lb;
>>>>>>>>> + private final DirectBuffer db;
>>>>>>>>>
>>>>>>>>> private PerfCounter(String name, int type) {
>>>>>>>>> this.name = name;
>>>>>>>>> ByteBuffer bb = perf.createLong(name, U_None, type, 0L);
>>>>>>>>> bb.order(ByteOrder.nativeOrder());
>>>>>>>>> this.lb = bb.asLongBuffer();
>>>>>>>>> + this.db = bb instanceof DirectBuffer ? (DirectBuffer)
>>>>>>>>> bb :
>>>>>>>>> null;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> static PerfCounter newPerfCounter(String name) {
>>>>>>>>> @@ -79,23 +85,44 @@
>>>>>>>>> /**
>>>>>>>>> * Returns the current value of the perf counter.
>>>>>>>>> */
>>>>>>>>> - public synchronized long get() {
>>>>>>>>> + public long get() {
>>>>>>>>> + if (db != null) {
>>>>>>>>> + return unsafe.getLongVolatile(null, db.address());
>>>>>>>>> + }
>>>>>>>>> + else {
>>>>>>>>> + synchronized (this) {
>>>>>>>>> - return lb.get(0);
>>>>>>>>> - }
>>>>>>>>> + return lb.get(0);
>>>>>>>>> + }
>>>>>>>>> + }
>>>>>>>>> + }
>>>>>>>>>
>>>>>>>>> /**
>>>>>>>>> * Sets the value of the perf counter to the given newValue.
>>>>>>>>> */
>>>>>>>>> - public synchronized void set(long newValue) {
>>>>>>>>> + public void set(long newValue) {
>>>>>>>>> + if (db != null) {
>>>>>>>>> + unsafe.putOrderedLong(null, db.address(), newValue);
>>>>>>>>> + }
>>>>>>>>> + else {
>>>>>>>>> + synchronized (this) {
>>>>>>>>> - lb.put(0, newValue);
>>>>>>>>> - }
>>>>>>>>> + lb.put(0, newValue);
>>>>>>>>> + }
>>>>>>>>> + }
>>>>>>>>> + }
>>>>>>>>>
>>>>>>>>> /**
>>>>>>>>> * Adds the given value to the perf counter.
>>>>>>>>> */
>>>>>>>>> - public synchronized void add(long value) {
>>>>>>>>> - long res = get() + value;
>>>>>>>>> + public void add(long value) {
>>>>>>>>> + if (db != null) {
>>>>>>>>> + unsafe.getAndAddLong(null, db.address(), value);
>>>>>>>>> + }
>>>>>>>>> + else {
>>>>>>>>> + synchronized (this) {
>>>>>>>>> + long res = lb.get(0) + value;
>>>>>>>>> - lb.put(0, res);
>>>>>>>>> + lb.put(0, res);
>>>>>>>>> + }
>>>>>>>>> + }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> /**
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Testing the PerfCounter.increment() method in a loop on multiple
>>>>>>>>> threads
>>>>>>>>> sharing the same PerfCounter instance, for example, on a 4-core
>>>>>>>>> Intel i7
>>>>>>>>> machine produces the following results:
>>>>>>>>>
>>>>>>>>> #
>>>>>>>>> # PerfCounter_increment: run duration: 5,000 ms, #of logical
>>>>>>>>> CPUS: 8
>>>>>>>>> #
>>>>>>>>> 1 threads, Tavg = 19.02 ns/op (? = 0.00 ns/op)
>>>>>>>>> 2 threads, Tavg = 109.93 ns/op (? = 6.17 ns/op)
>>>>>>>>> 3 threads, Tavg = 136.64 ns/op (? = 2.99 ns/op)
>>>>>>>>> 4 threads, Tavg = 293.26 ns/op (? = 5.30 ns/op)
>>>>>>>>> 5 threads, Tavg = 316.94 ns/op (? = 6.28 ns/op)
>>>>>>>>> 6 threads, Tavg = 686.96 ns/op (? = 7.09 ns/op)
>>>>>>>>> 7 threads, Tavg = 793.28 ns/op (? = 10.57 ns/op)
>>>>>>>>> 8 threads, Tavg = 898.15 ns/op (? = 14.63 ns/op)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With the presented patch, the results are a little better:
>>>>>>>>>
>>>>>>>>> #
>>>>>>>>> # PerfCounter_increment: run duration: 5,000 ms, #of logical
>>>>>>>>> CPUS: 8
>>>>>>>>> #
>>>>>>>>> # Measure:
>>>>>>>>> 1 threads, Tavg = 5.22 ns/op (? = 0.00 ns/op)
>>>>>>>>> 2 threads, Tavg = 34.51 ns/op (? = 0.60 ns/op)
>>>>>>>>> 3 threads, Tavg = 54.85 ns/op (? = 1.42 ns/op)
>>>>>>>>> 4 threads, Tavg = 74.67 ns/op (? = 1.71 ns/op)
>>>>>>>>> 5 threads, Tavg = 94.71 ns/op (? = 41.68 ns/op)
>>>>>>>>> 6 threads, Tavg = 114.80 ns/op (? = 32.10 ns/op)
>>>>>>>>> 7 threads, Tavg = 136.70 ns/op (? = 26.80 ns/op)
>>>>>>>>> 8 threads, Tavg = 158.48 ns/op (? = 9.93 ns/op)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The scalability is not much better, but the raw speed is, so it
>>>>>>>>> might
>>>>>>>>> present less contention when used in real-world code. If you
>>>>>>>>> wanted
>>>>>>>>> even
>>>>>>>>> better scalability, there is a new class in JDK8, the
>>>>>>>>> java.util.concurrent.LongAdder. But that doesn't buy atomic
>>>>>>>>> "set()" -
>>>>>>>>> only "add()". And it can't update native-memory variables, so it
>>>>>>>>> could
>>>>>>>>> only be used for add-only counters and in conjunction with a
>>>>>>>>> background
>>>>>>>>> thread that would periodically flush the sum to the native
>>>>>>>>> memory....
>>>>>>>>>
>>>>>>>>> Regards, Peter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 02/08/2013 06:10 PM, Nils Loodin wrote:
>>>>>>>>>> It would be interesting to know the number of thrown
>>>>>>>>>> throwables in
>>>>>>>>>> the
>>>>>>>>>> JVM, to be able to do some high level application diagnostics /
>>>>>>>>>> statistics. A good way to put this number would be a performance
>>>>>>>>>> counter, since it is accessible both from Java and from the VM.
>>>>>>>>>>
>>>>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806
>>>>>>>>>> http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Nils Loodin
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
More information about the core-libs-dev
mailing list