[OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue?

Mon Mar 24 22:23:04 UTC 2008

   Chris pointed to me that currently SunToolkit.lock() uses
   ReentrantLock which is supposed to have better
   characteristics than built-in Java synchronization
   under contention.

   So it would be interesting to see exactly what you were
   measuring, and how.

   Also, if you're doing any kind of Java2D performance
   testing I would encourage to use J2DBench as the
   benchmark (can be found in jdk/src/share/demo/J2DBench).
   You can plug in new tests if the existing ones don't
   match what you want to test.

   Thanks,
     Dmitri

Dmitri Trembovetski wrote:
> 
>   Hi Clemens.
> 
> Clemens Eisserer wrote:
>> Hello,
>>
>>>    Since most applications do render from one thread (either the
>>>    Event Queue like Swing apps, or some kind of dedicated rendering
>>>    thread like games), the lock is indeed very fast, given
>>>    biased locking and such.
>>>
>>>    I would suggest not trying to optimize things - especially tricky
>>>    ones which involve locking - until you have
>>>    identified with some kind of tool that there's a problem.
>>
>> I did some benchmarking to find out the best design for my new
>> pipeline, and these are the results I got:
>>
>> 10mio solid 1x1 rect, VolatileImage, server-compiler, Core2Duo-2ghz,
>> Intel-945GM, Linux:
>>
>> 200ms no locking, no native call
>> 650ms locking only
>> 850ms native call, no locking
>> 1350ms as currently implemented in X11Renderer
> 
>   Did you mean OGLRenderer? The X11Renderer doesn't use single
>   thread rendering model and thus doesn't need render queue.
> 
>   Note that on X11 the render queue lock is doubled as the lock against
>   all X11 access - for both awt and 2d. We must lock around it because
>   we all use the same display, and X11 is not multi-threaded (at
>   least in the way we use it).
>   This means that the lock is likely to be promoted to a heavyweight lock,
>   which is why it is expensive.
> 
>   So the problem with having separate render buffers per thread is that
>   at some point you will have to synchronize on SunToolkit.awtLock()
>   anyway.
> 
>> I did rendering only from a single thread (however not the EDT), in
>> this simple pipeline-overhead test the locking itself is almost as
>> expensive as the "real" work (=native call), and far more expensive
>> than an "empty" JNI call.
>> However this was on a dual-core machine, on my single-core amd64
>> machine locking has much less influence. As far as I know biased
>> locking is only implemented for monitors.
>> Xorg ran on my 2nd core, and kept it with locking only 40% busy,
>> without locking about 80%.
>>
>> However I have to admit there are probably much more important things
>> to do than playing with things like that ;)
> 
>   You probably can explore ways to improve the current design,
>   which only allows a single rendering queue. For example,
>   we had discussed the possibility of extending the STR design
>   to allow a rendering thread per destination. But again,
>   on unix it will bump against the need to sync around X11 access.
> 
>   You can also play with having a render buffer per thread as
>   you suggest, but your rendering thread will have to sync for
>   reading from each render buffer - presumably on the same lock
>   as the thread used to put stuff into that buffer.
>   All doable, but risky and hard to assess the benefits before
>   you have a working implementation. Just commenting out
>   locks gives wrong impression, since the resulting code
>   becomes incorrect and thus the benchmark results can't be
>   trusted.
> 
>   Anyway, I would suggest that you look at optimizing
>   this later.
> 
>>>    If it appears null during a sync() call, no harm is done (the
>>>    sync is just ignored - which is fine given that the render queue
>>>    hasn't been created yet, so there's nothing to sync), so this is
>>>    not a problem.
>> But what does happen if it has already been created, but the thread
>> calling sync() just does not see the updated "theInstance" value?
>> Could there be any problem when sync()-calls are left out?
> 
>   If the thread calling sync() sees theInstance as null, this means
>   that it could not have anything to sync. If there's no queue,
>   it could not have put anything into that queue prior to
>   calling sync(). The sync() can be safely ignored.
> 
>   Thanks,
>     Dmitri