Canvas performance on Mac OS
Jim Graham
james.graham at oracle.com
Tue Apr 7 23:16:28 UTC 2015
OK, I took the time to put my rMBP on a diet yesterday and find room to
install a 10.10 partition. I get the same numbers for Sierpinski on
10.10, so my theory that something changed in the OGL implementation for
10.10 doesn't hold water.
But, I then tried it using the integrated graphics. I get really poor
performance using the integrated Intel 4000 graphics, but I get great
numbers on the discrete nVidia 650m. It makes sense that the Intel
graphics wouldn't be as powerful as the discrete graphics, but we
shouldn't be taxing it that much to make that big of a difference.
Just to be sure - is that iMac a dual graphics system, or is it
all-AMD-all-the-time? You can see which GPU is being used if you run it
with -Dprism.verbose=true...
...jim
On 4/2/15 4:13 PM, Jim Graham wrote:
> On my retina MBP (10.8) I get 60fps for es2 and 44fps for sw. Are you
> running a newer version of MacOS?
>
> ...jim
>
> On 3/31/15 3:40 PM, Chris Newland wrote:
>> Hi Hervé,
>>
>> That's a valid question :)
>>
>> Probably because
>>
>> a) All my non-UI graphics experience is with immediate-mode / raster
>> systems
>>
>> b) I'm interested in using JavaFX for particle effects / demoscene /
>> gaming so assumed (perhaps wrongly?) that scenegraph was not the way
>> to go
>> for that due to the very large number of nodes.
>>
>> Numbers for my Sierpinski filled triangle example:
>>
>> System: 2011 iMac Core i7 3.4GHz / 20GB RAM / AMD Radeon HD 6970M 1024 MB
>>
>> java -Dprism.order=es2 -cp target/classes/
>> com.chrisnewland.demofx.standalone.Sierpinski
>> fps: 1
>> fps: 23
>> fps: 18
>> fps: 25
>> fps: 18
>> fps: 23
>> fps: 23
>> fps: 19
>> fps: 25
>>
>> java -Dprism.order=sw -cp target/classes/
>> com.chrisnewland.demofx.standalone.Sierpinski
>> fps: 1
>> fps: 54
>> fps: 60
>> fps: 60
>> fps: 60
>> fps: 60
>> fps: 60
>> fps: 60
>> fps: 60
>> fps: 60
>> fps: 60
>>
>> There are never more than 2500 filled triangles on screen. JDK is
>> 1.8.0_40
>>
>> I would say there is a performance problem here? (or at least a need for
>> documentation so as to set expectations for gc.fillPolygon).
>>
>> Best regards,
>>
>> Chris
>>
>>
>>
>>
>> On Tue, March 31, 2015 22:00, Hervé Girod wrote:
>>> Why don't you use Nodes rather than Canvas ?
>>>
>>>
>>> Sent from my iPhone
>>>
>>>
>>>> On Mar 31, 2015, at 22:31, Chris Newland <cnewland at chrisnewland.com>
>>>> wrote:
>>>>
>>>>
>>>> Hi Jim,
>>>>
>>>>
>>>> Thanks, that makes things much clearer.
>>>>
>>>>
>>>> I was surprised how much was going on under the hood of GraphicsContext
>>>> and hoped it was just magic glue that gave the best of GPU
>>>> acceleration where available and immediate-mode-like simple rasterizing
>>>> where not.
>>>>
>>>> I've managed to find an anomaly with GraphicsContext.fillPolygon where
>>>> the software pipeline achieves the full 60fps but ES2 can only manage
>>>> 30-35fps. It uses lots of overlapping filled triangles so I expect
>>>> suffers from the problem you've described.
>>>>
>>>> SSCCE:
>>>> https://github.com/chriswhocodes/DemoFX/blob/master/src/main/java/com/ch
>>>>
>>>> risnewland/demofx/standalone/Sierpinski.java
>>>>
>>>> Was full frame rate canvas drawing an expected use case for JavaFX or
>>>> would I be better off with Graphics2D?
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Chris
>>>>
>>>>
>>>>> On Mon, March 30, 2015 20:04, Jim Graham wrote:
>>>>> Hi Chris,
>>>>>
>>>>>
>>>>>
>>>>> drawLine() is a very simple primitive that can be optimized with a
>>>>> GPU
>>>>> shader. It either looks like a (potentially rotated) rectangle or a
>>>>> rounded rect - and we have optimized shaders for both cases. A large
>>>>> number of drawLine() calls turns into simply accumulating a large
>>>>> vertex list and uploading it to the GPU with an appropriate shader
>>>>> which is very fast.
>>>>>
>>>>> drawPolygon() is a very complex operation that involves things like:
>>>>>
>>>>> - dealing with line joins between segments that don't exist for
>>>>> drawLine() - dealing with only rendering common points of intersection
>>>>> once
>>>>>
>>>>> To handle all of that complexity we have to involve a rasterizer that
>>>>> takes the entire collection of lines, analyzes the stroke attributes
>>>>> and interactions and computes a coverage mask for each pixel in the
>>>>> region. We do that in software currently for all pipelines.
>>>>>
>>>>> For the ES2 pipeline Line.v.Poly is dominated by pure GPU vs CPU path
>>>>> rasterization.
>>>>>
>>>>> For the SW pipeline, drawLine is a simplified case of drawPolygon and
>>>>> so the overhead of lots of calls to drawLine() dominates its
>>>>> performance.
>>>>>
>>>>> I would expect ES2 to blow the SW pipeline out of the water with
>>>>> drawLine() performance (as long as there are no additional rendering
>>>>> primitives interspersed in the set of lines).
>>>>>
>>>>> But, both should be on the same footing for the drawPolygon case.
>>>>> Does
>>>>> the ES2 pipeline compare similarly (hopefully better than) the SW
>>>>> pipeline for the polygon case?
>>>>>
>>>>> One thing I noticed is that we have no optimized case for drawLine()
>>>>> on the SW pipeline. It generates a path containing a single MOVETO
>>>>> and LINETO and feeds it to the generalized path rasterizer when it
>>>>> could instead compute the rounded/square rectangle and render it more
>>>>> directly. If we added that support then I'd expect the SW pipeline to
>>>>> perform the set of drawLine calls faster than drawPolygon as well...
>>>>>
>>>>> ...jim
>>>>>
>>>>>
>>>>>
>>>>>> On 3/28/15 3:22 AM, Chris Newland wrote:
>>>>>>
>>>>>>
>>>>>> Hi Robert,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I've not filed a Jira yet as I was hoping to find time to
>>>>>> investigate thoroughly but when I saw your question I thought I'd
>>>>>> better add my findings.
>>>>>>
>>>>>> I believe the issue is in the ES2Pipeline as if I run with
>>>>>> -Dprism.order=sw then strokePolygon outperforms the series of
>>>>>> strokeLine commands as expected:
>>>>>>
>>>>>> java -cp target/DemoFX.jar -Dprism.order=sw
>>>>>> com.chrisnewland.demofx.DemoFXApplication -c 500 -m line Result:
>>>>>> 44fps
>>>>>>
>>>>>>
>>>>>>
>>>>>> java -cp target/DemoFX.jar -Dprism.order=sw
>>>>>> com.chrisnewland.demofx.DemoFXApplication -c 500 -m poly Result:
>>>>>> 60fps
>>>>>>
>>>>>>
>>>>>>
>>>>>> Will see if I can find the root cause as I've got plenty more
>>>>>> examples where ES2Pipeline performs horribly on my Mac which should
>>>>>> have no problem throwing around a few thousand polys.
>>>>>>
>>>>>> I realise there's a *lot* of indirection involved in making JavaFX
>>>>>> support such a wide range of underlying graphics systems but I do
>>>>>> think there's a bug here.
>>>>>>
>>>>>> Will file a Jira if I can contribute a bit more than "feels slow"
>>>>>> ;)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Sat, March 28, 2015 10:06, Robert Krüger wrote:
>>>>>>>
>>>>>>>
>>>>>>> This is consistent with what I am observing. Is this something
>>>>>>> that Oracle
>>>>>>> is aware of? Looking at Jira, I don't see that anyone is working
>>>>>>> on this:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> https://javafx-jira.kenai.com/issues/?jql=status%20in%20(Open%2C%
>>>>>>> 20%2
>>>>>>> 2In%
>>>>>>> 20Progress%22%2C%20Reopened)%20AND%20labels%20in%20(macosx)%20%20A
>>>>>>> ND%2
>>>>>>> 0la
>>>>>>> bels%20in%20(performance)
>>>>>>>
>>>>>>> Given that one of the One of the main reasons to use JFX for me
>>>>>>> is to be able to develop with one code base for at least OSX and
>>>>>>> Windows and
>>>>>>> the official statement what JavaFX is for, i.e.
>>>>>>>
>>>>>>> "JavaFX is a set of graphics and media packages that enables
>>>>>>> developers to design, create, test, debug, and deploy rich client
>>>>>>> applications that operate consistently across diverse platforms"
>>>>>>>
>>>>>>> and the fact that this is clearly not the case currently (8u40)
>>>>>>> as soon as I do something else than simple forms, I run into
>>>>>>> performance/quality problems on the Mac, I am a bit unsure what
>>>>>>> to make of all that. Is Mac OSX a second-class citizen as far as
>>>>>>> dev resources are concerned?
>>>>>>>
>>>>>>> Tobi and Chris, have you filed Jira Issues on Mac graphics
>>>>>>> performance that can be tracked?
>>>>>>>
>>>>>>> I will file an issue with a simple test case and hope for the
>>>>>>> best.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 27, 2015 at 11:08 PM, Chris Newland
>>>>>>> <cnewland at chrisnewland.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Possibly related:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I can reproduce a massive (90%) performance drop on OSX between
>>>>>>>> drawing a wireframe polygon on a Canvas using a series of
>>>>>>>> gc.strokeLine(double x1, double y1, double x2, double y2)
>>>>>>>> commands versus using a single gc.strokePolygon(double[]
>>>>>>>> xPoints, double[] yPoints, int count) command.
>>>>>>>>
>>>>>>>> Creating the polygons manually with strokeLine() is
>>>>>>>> significantly faster using the ES2Pipeline on OSX.
>>>>>>>>
>>>>>>>> This is reproducible in a little GitHub JavaFX benchmarking
>>>>>>>> project I've
>>>>>>>> created: https://github.com/chriswhocodes/DemoFX
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Build with ant
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Run with:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> # use strokeLine
>>>>>>>> ./run.sh -c 5000 -m line
>>>>>>>> result: 60 (sixty) fps
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> # use strokePolygon
>>>>>>>> ./run.sh -c 5000 -m poly
>>>>>>>> result: 6 (six) fps
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> System is 2011 iMac 27" / Mavericks / 3.4GHz Core i7 / 20GB RAM
>>>>>>>> /
>>>>>>>> Radeon
>>>>>>>> 6970M 1024MB
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at the code paths in
>>>>>>>> javafx.scene.canvas.GraphicsContext:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> gc.strokeLine() maps to writeOp4(x1, y1, x2, y2,
>>>>>>>> NGCanvas.STROKE_LINE)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> gc.strokePolygon() maps to writePoly(xPoints, yPoints, nPoints,
>>>>>>>> true, NGCanvas.STROKE_PATH) which involves significantly more
>>>>>>>> work with adding to and flushing a GrowableDataBuffer.
>>>>>>>>
>>>>>>>> I've not had time to dig any deeper than this but it's surely a
>>>>>>>> bug when building a poly manually is 10x faster than using the
>>>>>>>> convenience method.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, March 27, 2015 21:26, Tobias Bley wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> In my opinion the whole graphics performance on MacOSX
>>>>>>>>> isn’t good at all with JavaFX….
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Am 27.03.2015 um 22:10 schrieb Robert Krüger
>>>>>>>>>> <krueger at lesspain.de>:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The bad full screen performance is without the arcs. It is
>>>>>>>>>> just one call to fillRect, two to strokeOval and one to
>>>>>>>>>> fillOval, that's all. I will build a simple test case and
>>>>>>>>>> file an issue.
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 27, 2015 at 9:58 PM, Jim Graham
>>>>>>>>>> <james.graham at oracle.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi Robert,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Please file a Jira issue with a simple test case. Arcs
>>>>>>>>>>> are handled as a generalized shape rather than via a
>>>>>>>>>>> predetermined shader, but it shouldn't be that slow.
>>>>>>>>>>> Something else may
>>>>>>>>>>> be going on.
>>>>>>>>>>>
>>>>>>>>>>> Another test might be to replace the arcs with rectangles
>>>>>>>>>>> or ellipses and see if the performance changes...
>>>>>>>>>>>
>>>>>>>>>>> ...jim
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/27/15 1:52 PM, Robert Krüger wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I have a super-simple animation implemented using
>>>>>>>>>>>> AnimationTimer
>>>>>>>>>>>> and Canvas where the canvas just performs a few draw
>>>>>>>>>>>> operations, i.e. fills the screen with a color and then
>>>>>>>>>>>> draws and fills 2-3 circles and I have already
>>>>>>>>>>>> observed that each drawing operation I add, results in
>>>>>>>>>>>> significant CPU load (e.g. when I draw < 10 arcs in
>>>>>>>>>>>> addition to the circles, the CPU load goes up to 30-40%
>>>>>>>>>>>> on a Mac Book Pro
>>>>>>>>>>>> for a Canvas size of 600x600(!).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Now I tested the animation in full screen mode (only
>>>>>>>>>>>> with a few circles) and playback is unusable for a
>>>>>>>>>>>> serious application (very choppy). Is 2D canvas
>>>>>>>>>>>> performance known to be very bad on Mac or
>>>>>>>>>>>> am I doing something wrong? Are there workarounds for
>>>>>>>>>>>> this?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Robert
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Robert Krüger
>>>>>>>>>> Managing Partner
>>>>>>>>>> Lesspain GmbH & Co. KG
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> www.lesspain-software.com
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Robert Krüger
>>>>>>> Managing Partner
>>>>>>> Lesspain GmbH & Co. KG
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> www.lesspain-software.com
>>>>
>>>>
>>>
>>
>>
More information about the openjfx-dev
mailing list