Paint Phase Debugging / Performance

Tue May 1 18:48:03 UTC 2018

Hi Matthew,

I agree this is a very important question.

I don't like it when people say that "JavaFX is slow" because if everything
is done right, JavaFX can be extremely fast.
But I also realise that it is very easy to make things very slow in JavaFX,
and in case "performance" is slow, it is important to pinpoint the problem
as easy as possible. And that is often not trivial, so the easier it
becomes to detect problems, the more developers will like JavaFX.

I typically start with setting prism.verbose and javafx.pulseLogger to
true, as you first need to know in which phase the problem is (e.g. layout
or render phase). If you see the problem is in the render phase, you have
to look at completely different things.
Note that in most cases I've seen with bad performance, the problem was not
in the rendering, but simply due to too much happening on the JavaFX
application thread, preventing it to start doing the rendering.
Those things are typically easily detected by generating thread dumps or
using simple profilers to find out what the JavaFX thread is typically
doing. If you see bad performance but in 90% of the thread dump there is no
trace of rendering, you know where to look.

When things are slow in the paint phase, the information about how many
Nodes are visited and rendered, and how many are cached is very useful. I
have a local debug version where I keep track of how many times a Node is
rendered from cache versus how often the cache is invalid and needs to be
recomputed (which is extremely expensive). It this was somehow automated,
it could improve performance. It is similar to what hotspot C1/C2 is doing
for code: it is expensive to compile all code at runtime, but when it turns
out methods are often used, they are compiled. Similar, if the rendering
pipeline detects that a node would remain cache-valid for most of the time,
it might automatically set that node.setCache(true). But that's not a
trivial thing to implement.

The options in PrismSettings.java (e.g. prism.trace,
prism.printrendergraph, prism.showdirty) and in QuantumToolit
(quantum.debug) can be helpful as well.

When there are no clear indications (e.g. not too many nodes, no invalid
caches), I go for profiling, working bottom-up. I have the JavaFX source
code always at hand when doing this, in order to see what exactly is
happening.
There are some patterns, e.g. on Android I know that lots of time spent in
System.arraycopy is an indication about lots of slow CSS processing (Bitset
operations, if you follow the profiling information bottom-up)

It would be great to have tools that auto-detect this. Detecting slow
render phases is already done, but linking to the root cause is of course
much harder.

I don't think that interrupting the paint phase is a good thing. If that
takes 200-300 ms, it is very likely it will take 200-300ms in the next
cycle.

It might be an option though to dynamically throttle the pulse frequency
(which can now only be changed at startup with javafx.animation.pulse) in
case slow rendering is detected.

- Johan

On Tue, May 1, 2018 at 8:17 PM Matthew Elliot <
matthew.james.elliot at gmail.com> wrote:

> Hi all,
>
> The last few days I was troubleshooting a new performance issue that showed
> up in our PROD application where customers had fallen back to the SW
> rendering pipeline. It severely affected the application where CPU
> frequency was under 3 GHz with hover lags of a few seconds in the worst
> cases. With thousands of potential HW/SW combinations in the wild it took
> quite a while to even identify it really was an issue in our application
> and not the usual noise of some silly set up. All this got me thinking...
>
> ... what was visible was long paint passes, and long waiting on previous
> render but narrowing this down to exactly what was going on took a lot of
> manual inspection of the rendering pipeline code / debugging and somewhat
> by chance I stumbled over the -Dprism.disableEffects flag which after much
> more pain helped me narrow down the issue.
>
> The root cause turned out to be an -fx-effect (blend, inner shadow) on an
> animated node that was set from the code by an unknowing developer.
>
> While there are tools like mission controller for visualising the pulse and
> phases it can be difficult to identify for example what is going wrong
> inside of the painting phase and it is difficult to control that nothing
> bad happens when many developers can make changes to the code and reviews
> will never catch everything... I'm therefore thinking about ways to run
> rendering tests in continuous integration that would fail fast if the SW
> rendering pipeline would get overloaded.
> I had a look at PulseListener where I could see pulse times but I'd like to
> go more detailed and actually like the information tracked in the internals
> of PulseLogger (PulseData) without doing any nasty tricks.
>
> I thought maybe somebody has already thought about this problem before and
> maybe there is even some tooling around this beyond the logging? I could
> even imagine using the same technique to monitor the rendering pipeline in
> real time and alerting us (maybe even the user) if things are going a bit
> sideways.
>
> Maybe more generically, how do you even start to debug delays in the paint
> phase? Timed breakpoints and IDE assisted debug logging aside. :)
>
> Matt.
>
> PS: It might even be nice to tell the Painter to give up after N ms (prism
> setting?). Sometimes better to break than to not be useable because of
> paint phases taking 200-300 ms and JavaFX Application Thread thread getting
> almost starved completely.
>