[OpenJDK 2D-Dev] Thread-Private RenderBuffers for RenderQueue?

Mon Mar 24 21:44:16 UTC 2008

   Hi Clemens.

Clemens Eisserer wrote:
> Hello,
> 
>>    Since most applications do render from one thread (either the
>>    Event Queue like Swing apps, or some kind of dedicated rendering
>>    thread like games), the lock is indeed very fast, given
>>    biased locking and such.
>>
>>    I would suggest not trying to optimize things - especially tricky
>>    ones which involve locking - until you have
>>    identified with some kind of tool that there's a problem.
> 
> I did some benchmarking to find out the best design for my new
> pipeline, and these are the results I got:
> 
> 10mio solid 1x1 rect, VolatileImage, server-compiler, Core2Duo-2ghz,
> Intel-945GM, Linux:
> 
> 200ms no locking, no native call
> 650ms locking only
> 850ms native call, no locking
> 1350ms as currently implemented in X11Renderer

   Did you mean OGLRenderer? The X11Renderer doesn't use single
   thread rendering model and thus doesn't need render queue.

   Note that on X11 the render queue lock is doubled as the lock against
   all X11 access - for both awt and 2d. We must lock around it because
   we all use the same display, and X11 is not multi-threaded (at
   least in the way we use it).
   This means that the lock is likely to be promoted to a heavyweight lock,
   which is why it is expensive.

   So the problem with having separate render buffers per thread is that
   at some point you will have to synchronize on SunToolkit.awtLock()
   anyway.

> I did rendering only from a single thread (however not the EDT), in
> this simple pipeline-overhead test the locking itself is almost as
> expensive as the "real" work (=native call), and far more expensive
> than an "empty" JNI call.
> However this was on a dual-core machine, on my single-core amd64
> machine locking has much less influence. As far as I know biased
> locking is only implemented for monitors.
> Xorg ran on my 2nd core, and kept it with locking only 40% busy,
> without locking about 80%.
> 
> However I have to admit there are probably much more important things
> to do than playing with things like that ;)

   You probably can explore ways to improve the current design,
   which only allows a single rendering queue. For example,
   we had discussed the possibility of extending the STR design
   to allow a rendering thread per destination. But again,
   on unix it will bump against the need to sync around X11 access.

   You can also play with having a render buffer per thread as
   you suggest, but your rendering thread will have to sync for
   reading from each render buffer - presumably on the same lock
   as the thread used to put stuff into that buffer.
   All doable, but risky and hard to assess the benefits before
   you have a working implementation. Just commenting out
   locks gives wrong impression, since the resulting code
   becomes incorrect and thus the benchmark results can't be
   trusted.

   Anyway, I would suggest that you look at optimizing
   this later.

>>    If it appears null during a sync() call, no harm is done (the
>>    sync is just ignored - which is fine given that the render queue
>>    hasn't been created yet, so there's nothing to sync), so this is
>>    not a problem.
> But what does happen if it has already been created, but the thread
> calling sync() just does not see the updated "theInstance" value?
> Could there be any problem when sync()-calls are left out?

   If the thread calling sync() sees theInstance as null, this means
   that it could not have anything to sync. If there's no queue,
   it could not have put anything into that queue prior to
   calling sync(). The sync() can be safely ignored.

   Thanks,
     Dmitri