JNI-performance - Is it really that fast?

Tue Mar 25 23:13:27 UTC 2008

Hi Dave,

Thanks a lot for answering that detailed. Congratulations to the
BiasedLocking work, its really great to see such inovative features in
the JVM :)

>  good choice when "synchronized" doesn't fit the bill, such as when you
>  might need timed waits, trylock, hand-over-hand "coupled" locking,
>  etc.   ReentrantLock also tends to be used in situations where the
>  programmer is sure multiple threads are actively coordinating their
>  operation, meaning that ReentrantLock would benefit little from biased
>  locking.  For most synchronization -- contended or uncontended --
>  you're better off with synchronized as you get the benefits of biased
>  locking, adaptive spinning, potential lock elision via escape
>  analysis, and in the future, hardware transactional lock elision (http://blogs.sun.com/dave/entry/rock_style_transactional_memory_lock
>  ).

I was asking because I did some benchmarking and (on my dual-core
machine, with an obscure microbenchmark) the grabbing the AWTLock for
a 1x1 rectangle takes almost as much time as the whole Xlib-processing
+ JNI-overhead.

The code looks like:
SunToolkit.awtLock();
long xgc = validate(sg2d); //simple, pure Java method
XFillRect(sg2d.surfaceData.getNativeOps(), .....); // native method
SunToolkit.awtUnlock();

10mio 1x1 rect:
600ms native method commented out
850ms locking commented out.
1400ms locking+native method

The numbers include all the code-path from Graphics.fillRect() up to
X11Renderer.fillRetc.

As you can see locking (at least on my machine) is almost as expensive
as the JNI-Downcall and the real work together. I used the
server-compiler.

The AWTLock was a java-monitor till JDK5 (not 100% sure), but was a
victim of contention because it was used also from native code and
sometimes from multiple threads (but I guess it was not heavy
contended in most cases).
IN JDK6 it was replaced with a ReentrantLock, some features like
tryLock() where used to implement the new OpenGL pipeline ...
performance also improved.

>  In your case if the lock is ever shared -- that is, locked by multiple
>  threads during its lifetime -- then biased locking probably won't
>  provide the latency reduction benefit you're after.   The object will
>  likely become unbiased at some point.  I suspect that sharing will
>  ultimately occur in your case, but be infrequent, correct?

Exactly, the most likely scenary is that there is one rendering-thread
which does million of locks, and a few other calls from native code
(currently they upcall from C to lock the ReentrantLock).
It could also happen that there are two or more active rendering
threads at the same time, but this is not really common and a fallback
to unbiased would be totally ok.
Wouldn't be a BiasedLock something worth to implement, maybe with the
possibility how fast/likely the Lock can become unbiased?

However this really has not a lot prioritry to me ... I really should
care about other things ... somehow I entraped into this when deciding
the design of my XRender-Java2d pipeline. Sorry for all the traffic...

lg Clemens