A different way to handle pulse timing

Fri Aug 2 05:59:44 PDT 2013

On 8/1/2013 11:52 PM, Richard Bair wrote:
>> as far as I can read it, your idea is to start preparing the next
>> frame right after synchronization (scenegraph to render tree) is
>> completed for the previous frame. Do I get it correctly? If yes,
>> we'll likely re-introduce the old problem with input events
>> starvation. There will be no or very little window, when the events
>> can be processed on the event thread, because the thread will
>> always be either busy handling CSS, animations, etc., or blocked
>> waiting for the render thread to finish rendering.
>
> I think the difference is that I was going to use the vsync as the
> limiter. That is, the first time through we do a pulse, then we
> schedule another pulse, then we run that other pulse (almost
> immediately), then we hit the sync point with the render thread and
> have to wait for it because it is blocked on vsync. Meanwhile the
> user events are being queued up. When we get back from this, the next
> pulse is placed on the end of the queue, we process all input events,
> then the next pulse.

I now see the picture.

As I wrote in the previous email, it seems that we currently are not 
blocked waiting for vsync(), at least on Windows with D3D pipeline. 
Anyway, even if we "fix" that, what you propose is that sometimes both 
threads will be blocked (the render thread waiting for vsync, the event 
thread waiting for the render thread), which doesn't sound perfect.

Note that on Windows and Mac OS X, input events and application 
runnables are handled differently at the native level (either using 
different mechanisms, or having different priorities). To implement this 
proposal, we'll need to eliminate the difference, which may be a 
difficult task.

Thanks,

Artem

>>> Whenever an animation starts, the runningAnimationCounter is
>>> incremented. When an animation ends, it is decremented (or it could
>>> be a Set<Animation> or whatever). The pendingPulse is simply false to
>>> start with, and is checked before we submit another pulse. Whenever a
>>> node in the scene graph becomes dirty, or the scene is resized, or
>>> stylesheets are changed, or in any case something happens that
>>> requires us to draw again, we check this flag and fire a new pulse if
>>> one is not already pending.
>>
>> Scene graph is only changed on the event thread. So my guess is that "fire a new pulse" is just
>>
>>   Platform.runLater(() -> pulse())
>
> Right.
>
>>> When a pulse occurs, we process animations first, then CSS, then
>>> layout, then validate all the bounds, and *then we block* until the
>>> rendering thread is available for synchronization. I believe this is
>>> what we are doing today (it was a change Steve and I looked at with
>>> Jasper a couple months ago IIRC).
>>>
>>> But now for the new part. Immediately after synchronization, we check
>>> the runningAnimationCounter. If it is > 0, then we fire off a new
>>> pulse and leave the pendingPulse flag set to true. If
>>> runningAnimationCounter == 0, then we flip pendingPulse to false.
>>> Other than the pick that always happens at the end of the pulse, we
>>> do nothing else new and, if the pick didn't cause state to change, we
>>> are now quiescent.
>>>
>>> Meanwhile, the render thread has run off doing its thing. The last
>>> step of rendering is the present, where we will block until the thing
>>> is presented, which, when we return, would put us *immediately* at
>>> the start of the next 16.66ms cycle. Since the render thread has just
>>> completed its duties, it goes back to waiting until the FX thread
>>> comes around asking to sync up again.
>>>
>>> If there is an animation going on such that a new pulse had been
>>> fired immediately after synchronization, then that new pulse would
>>> have been handled while the previous frame was being rendered. Most
>>> likely, by the time the render thread completes presenting and comes
>>> back to check with the FX thread, it will find that the FX thread is
>>> already waiting for it with the next frames data. It will synchronize
>>> immediately and then carry on rendering another frame.
>>
>> Given that you propose to fire a new pulse() whenever anything is changed in scene graph, and also right after synchronization, there is no need to have an external timer (QuantumToolkit.pulseTimer()) any longer.
>
> Correct.
>
>>> I think the way this would behave is that, when an animation is first
>>> played, you will get two pulses close to each other. The first pulse
>>> will do its business and then synchronize and then immediately fire
>>> off another pulse. That next pulse will then also get processed and
>>> then the FX thread will block until the previous frame finishes
>>> rendering. During this time, additional events (either application
>>> generated via runLater calls happening on background threads, or from
>>> OS events) will get queued up. Between pulse #2 and pulse #3 then a
>>> bunch of other events will get processed, essentially playing
>>> catch-up. My guess is that this won't be a problem but you might see
>>> a hiccup at the start of a new animation if the event queue is too
>>> full and it can't process all that stuff in 16ms (because at this
>>> point we're really multi-theaded between the FX and render threads
>>> and have nearly 16ms for each thread to do their business, instead of
>>> only 8ms which is what you'd have in a single threaded system).
>>>
>>> Another question I have is around resize events and how those work.
>>> If they also come in to glass on the FX thread (but at a higher
>>> priority than user events like a pulse or other input events?) then
>>> what will happen is that we will get a resize event and process a
>>> half-a-pulse (or maybe a whole pulse? animations+css+layout or just
>>> css+layout?) and then render, pretty much just as fast as we can.
>>>
>>> As for multiple scenes, I'm actually curious how this happens today.
>>> If I have 2 scenes, and we have just a single render thread servicing
>>> both, then when I go to present, it blocks? Or is there a
>>> non-blocking present method that we use instead? Because if we block,
>>> then having 2 scenes would cut you down to 30fps maximum, wouldn't
>>
>> This is a very interesting question... Experiments show that we can have more than one window/scene running at 60 fps. Someone from the graphics team should comment on this. My only guess (at least, in case of D3D pipeline) that present() doesn't block, if it's called no more than once between vsyncs (but the frame is shown on the screen on vsync anyway).
>
> And certainly with full-speed enabled we aren't blocking. The way I guess this would have to work is that if we have 10 scenes to render, we will end up rendering them all and then only block on the last render. My goal is to use the video card as the timer, in essence, such that:
>
> a) We have a full 16.666ms for the render thread to do its business each time and
> b) We never have "timer drift" between some external timer and the vsync timer
>
> There is potentially another benefit which is that we don't ever need to enable hi-res timers on Windows or whatnot, and never put the machine in a "non power save" state.
>
> Richard
>