Scene graph performance

Thu Jul 21 18:28:31 UTC 2016

Hi Richard,

Wow! Thanks - you really know your stuff!

Yes, it's not a "one LOC change" and I get that it has a lot to do with the difficult marriage of 2D and 3D worlds.

And it does seem like a large project in itself to solve these problems but, I really believe it *has* to be done and I intend to at least do whatever I can to achieve at least a linear relationship between CPU/GPU cores/grunt and JavaFX performance.

At the moment, it doesn't seem to matter *what* hardware you throw it at, the JavaFX scene graph performance is almost static.

The issues you have highlighted here will be very useful indeed in making this happen.

I think I might have mentioned to you privately that the Qt rendering pipeline had similar problems but has been greatly optimised in the last couple of releases.

I paid very close attention to the issues and how they resolved them and I'm sure many of those techniques could be applied in this scenario.

Anyway - it's certainly worth a try!

Felix

> On 22 Jul 2016, at 02:41, Richard Bair <richard.bair at oracle.com> wrote:
> 
> Have you guys profiled the application to see where the CPU time is spent? How many nodes in the app?
> 
> In the past the majority of CPU time has been spent in one or more of the following (not sure if it still applies):
>  - Computing changed bounds (a lot of work was done to speed this up, but I think it was always a single thread doing the work)
>  - Synchronizing state to the render graph (a lot of work was done here too, so I wouldn’t expect this to be the problem area)
>  - Walking the render graph to render shapes
>  - Rasterization (A lot of optimization went here too, but it is still essentially a CPU bound operation)
>  - Reading results back from the card (this is sometimes the culprit when it is slow on old hardware and fast on new hardware)
> 
> These are all CPU bound tasks.
> 
> I think that there are two angles to look at. First, we always wanted to break down the render stage into some parallel pipeline. Adobe Flex was good at this, where they’d saturate every CPU you had during the CPU intensive rasterization phase and scene graph computation phase. Depending on your particular test, this might actually be the bottleneck. So the idea here is to optimize the CPU tasks, which will (hopefully) remove CPU from the bottleneck and allow the GPU to take on more of the burden. You should also do some research or experiments with regards to battery life to make sure using more cores doesn’t make things worse (and if it does, to then have a flag to indicate the amount of parallelism). You also have to be careful because IIRC (and I may not!) if a lot of CPU activity happens on some laptops they’ll kick into a “high performance” mode, which is sometimes what you want, and sometimes not. Games are happy to kick you into that mode (and drain your battery faster as a result) whereas business apps rarely want to do that.
> 
> Another angle to look at is more of a research direction in the graphics rendering. We spent quite a lot of time looking into ways to go “shader all the way” and avoid having to use a software rasterizer at all. The state of the art as likely advanced from the last time I looked at it, but at the time there really wasn’t anything that we could find that was really ready for production in terms of producing 2D screens using 3D that really gave you the polish of 2D. Also, the scene graph semantics are fundamentally painter’s algorithm, since this is what everybody is used to when coming from a 2D background. But that’s not the way it works in 3D. In 3D you feed a bunch of meshes to the video card and it determines which pixels to render and which are occluded and don’t need to be rendered. But when you have multiple geometries at the same z-depth, then the card can have “z-fighting” where it renders the pixels from some items below you and some above. There are techniques to try to overcome this, but at least the last time we looked at it (according to my increasingly dimming memory!) there wasn’t a really brilliant solution to the problem. Anti-aliasing and transparency were big problems too.
> 
>>>>> DETOUR
> 
> Normal things that you would have in 2D like shadows, text, even rounded rectangles have historically been produced using various 2D algorithms and blend modes and so forth. Most people don’t even realize the degree to which their view of what a real 2D screen looks like has been tainted by the techniques that were available for producing those screens. Many (most? all? at least it was at the time) game developers recognized this and used 2D toolkits with CPU rasterization to produce their 2D screens and then overlaid this on 3D content. The normal way to do this is to render the 2D images in photoshop or something and then slice it up and load the pngs into the graphics card on app startup and then scale those to produce the images. This is fine, but in a general purpose toolkit like FX you can’t just do that, because we allow programmatic access to the scene graph and people can modify the “images” in real time. So we draw them and cache them and reuse the cached images whenever possible etc. A lot was done in order to try to optimize this part of the problem.
> 
> When I was benchmarking this stuff, we blew away pretty much everybody who was in the 2D+3D general purpose graphics toolkit world. We never tried to compete with the game vendors (like Unity). We weren’t trying to be a pure 3D scene graph. There was a huge discussion about this early on in FX, as to how to marry the 2D and 3D worlds. Developers in these different lands come at the problem differently, in terms of how they understand their world (y-up or y-down? 0,0 in the middle? Every scene scaled? Or 0,0 in top left and pixel scaled by default? Anti-aliasing?). We decided it was for 2D developers who wanted advanced graphics and animations, and a better toolkit for building apps (not games). We figured that for people who wanted to program games, we were never going to be really compelling without building out a lot of additional support, way beyond just graphics performance. Looking at Unity you can see where we’d have had to go to be a compelling game platform, and obviously Sun and Oracle are not in that business.
> 
> <<<<< END DETOUR
> 
> One of the projects I really wanted to do was to modify Prism to take advantage of multiple cores in the computation / rasterization steps. I think doing so would be a pretty major job and would have to be done quite carefully. My guess is that this would help with the problem you are seeing, but I couldn’t be 100% sure without digging into the details of the benchmark and profile.
> 
> Richard
> 
>> On Jul 21, 2016, at 4:04 AM, Felix Bembrick <felix.bembrick at gmail.com> wrote:
>> 
>> I would add that neither JOGL nor LWJGL have these issues.
>> 
>> Yes, I know they are somewhat different "animals", but the point is, clearly *Java* is NOT the cause.
>> 
>>> On 21 Jul 2016, at 20:07, Dr. Michael Paus <mp at jugs.org> wrote:
>>> 
>>> Hi Felix,
>>> I have written various tests like the ones you use in FXMark and I have
>>> obtained similar results. I have even tried to substitute 2D shapes by
>>> using 3D MeshViews in the hope that this would give better performance
>>> but the results were not that good. Of course all this depends on the
>>> specific test case but in general I see that a JavaFX application which
>>> makes heavy use of graphics animations is completely CPU-bounded.
>>> The maximum performance is reached when one CPU/Core is at 100%.
>>> The performance of your graphics hardware seems to be almost irrelevant.
>>> I could for example run four instances of the same test with almost the
>>> same performance at the same time. In this case all 4 cores of my machine
>>> were at 100%. This proves that the graphics hardware is not the limiting
>>> factor. My machine is a MacBook Pro with Retina graphics and a dedicated
>>> NVidia graphics card which is already a couple of years old and certainly
>>> not playing in the same league as your high-power card.
>>> I myself have not yet found a way to really speed up the graphics performance
>>> and I am a little bit frustrated because of that. But it is not only the general
>>> graphic performance which is a problem. There are also a lot of other pitfalls
>>> into which you can stumble and which can bring your animations to a halt
>>> or even crash your system. Zooming for example is one of these issues.
>>> 
>>> I would like to have some exchange on these issues and how to best address
>>> them but my impression so far is that there are only very view people interested
>>> in that. (I hope someone can prove me wrong on this :-)
>>> 
>>> Michael
>>> 
>>>> Am 20.07.16 um 04:14 schrieb Felix Bembrick:
>>>> Having written and tested FXMark on various platforms and devices, one
>>>> thing has really struck me as quite "odd".
>>>> 
>>>> I started work on FXMark as a kind of side project a while ago and, at the
>>>> time, my machine was powerful but not "super powerful".
>>>> 
>>>> So when I purchased a new machine with just about the highest specs
>>>> available including 2 x Xeon CPUs and (especially) 4 x NVIDIA GTX Titan X
>>>> GPUs in SLI mode, I was naturally expecting to see significant performance
>>>> improvements when I ran FXMark on this machine.
>>>> 
>>>> But to my surprise, and disappointment, the scene graph animations ran
>>>> almost NO faster whatsoever!
>>>> 
>>>> So then I decided to try FXMark on my wife's entry-level Dell i5 PC with a
>>>> rudimentary (single) GPU and, guess what - almost the same level of
>>>> performance (i.e. FPS and smoothness etc.) was achieved on this
>>>> considerably less powerful machine (in terms of both CPU and GPU).
>>>> 
>>>> So, it seems there is some kind of "performance wall" that limits the
>>>> rendering speed of the scene graph (and this is with full speed animations
>>>> enabled).
>>>> 
>>>> What is the nature of this "wall"? Is it simply that the rendering pipeline
>>>> is not making efficient use of the GPU? Is too much being done on the CPU?
>>>> 
>>>> Whatever the cause, I really think it needs to be addressed.
>>>> 
>>>> If I can't get better performance out of a machine that scores in the top
>>>> 0.01% of all machine in the world in the 3DMark Index than an entry level
>>>> PC, isn't this a MAJOR issue for JavaFX?
>>>> 
>>>> Blessings,
>>>> 
>>>> Felix
>