iOS: missing native font lib (a) in SDK directory
Richard Bair
richard.bair at oracle.com
Wed Jul 3 12:11:07 PDT 2013
Here is the breakdown of performance issues that I have. The ones I think will lead to decent wins are starred, and Super Shader triple-star'd. This list was pulled from the JIRA filter I previously sent. The point of this post is to give everybody an easy-to-see list of performance related issues (as of a month ago or so). Some of these might now be done, this isn't meant to be comprehensive (although at the time I did this I did visit each and every issue labeled "performance" so it was pretty comprehensive!).
Interested in helping out? I'll be glad to give background on any one of these issues and pointers as to how to go about working on any of them.
Richard
Architecture
• *RT-9363*: Consider reducing conversions between 'FX' API and scene graph API
• *RT-24582*: High frequency refresh and Heavy but low priority updates in the same app (multithreaded render, multi instance…)
• *RT-26492*: Use GCC link time optimization to reduce binary size
• *RT-26531*: Provide independent stage performance
• RT-15083: Replace boolean fields with bit fields
• RT-20397: Remove PGNodes
• RT-23470: Replace java.lang.Math usage in places where precision is not as important
• RT-23741: Add a hint to let scene graph and Prism know that we are animating
• RT-23866: Optimize Raspberry PI build for armv6/VFP
• RT-23867: Mac Glass uses gcc -O3 which is known to produce code with large static footprint
• RT-23868: Glass: Consider collapsing Event classes into a single one.
• RT-24238: Analyze property getters
• RT-29861: Consider replacing Math functions with a faster alternative
• RT-29900: Increased CPU usage on application iconified
Decora
• RT-2892: Improve performance of Gaussian-based effects
• RT-2908: Use scaled kernel to improve DropShadow performance for node scale factors < 1
• RT-5347: Prism: finish drop/inner shadow optimizations
• RT-5420: DropShadow effects significantly affect performance
• RT-6935: ColorAdjust effect consumes a lot of memory which could lead to OOM exception
• RT-8890: Merge and some Blend effects should optimize rendering to destination
• RT-9225, RT-9226, RT-9227: Various effects don't limit the size of the input image when requests are outside the clip
• RT-9432: Some of the hand-tuned software effect peers are not optimized for use with transformed inputs
• RT-9433: The auto-generated software peers for the effects filters do not handle transformed inputs optimally
• RT-9434: Reflection effect does not clip its output image to the requested clip bounds
• RT-9437: Prism and Hardware Swing pipelines could perform PerspectiveTransform directly
• RT-13714: Implement ColorAdjust as a matrix multiplication
Text
• *RT-23467*: Evaluate Native Text Engines
• *RT-23578*: Consider pre-populating the glyph cache with data for the default font at the default size(s)
• *RT-23705*: Reduce the amount of glyph data copied via Java from native to see if it helps performance
• *RT-23708*: Investigate if a segmented glyph cache can help performance
• *RT-30158*: Investigate String Measurement in FX (cache results, call less, …)
• RT-5069: Text node computes complete text layout, even if clipped to a much smaller size
• RT-6475: Need new hints to control how Text node is rendered
• RT-21269: Font#loadFont(String,double) downloads file in the main thread
• RT-23579: Consider using a fixed interval for glyph cache for faster computation
• RT-23580: Add a variant of text smoothing to deal with rotated text at higher versus lower quality
• RT-24329: LCD font smoothing performance
• RT-24565: Beagle: Complex Text implementation generates big swing in frame rate
• RT-24941: 8.0-graphics-scrum-h90: GlyphCache.render() takes up to 200ms which results in jerky rendering
• RT-26111: Use glyph bounding boxes to get visual bounds
• RT-26894: String rendering is less performant than java2D one
Scene Graph
• *RT-23346*: Provide API access to multiple hardware screen layers
• RT-5477: Improve performance and reduce garbage when animating gradients
• RT-5525: Group will get bounds change notification when child's bounds change, even if change in child didn't alter bounds of Group
• RT-9390: Improve picking performance using Dmitri's algorithm (or other)
• RT-9571: Consider adding image caching for images loaded from remote URLs
• RT-10604: Recomputing bounds when effects are used even if not dirty
• RT-10681: Reevaluate only changed KeyFrames
• RT-12105: Fix for RT-11562 disables an optimization for calculating content bounds
• RT-12136: SortedList possible optimizations
• RT-12137: FilteredList possible optimizations
• RT-12564: Layout spends considerable time in getManagedChildren
• RT-12715: Node.toBack()/toFront() are inefficient
• RT-13593: Performance of PathTransition sucks
• RT-19221: Padding for round cap could be optimized in Line
• RT-19222: Optimize impl_configShape of Path
• RT-20455: Do not always recreate the whole geometry in calls to impl_configShape
• RT-23312: OutOfMemoryError after pressing Ctrl+Alt+Del or minimizing the window whilst animating a canvas
• RT-24587: Changing a single child of FlowLayout is slower than changing all children
• RT-26007: Mouse event post-processing does unnecessary work, may be incorrect altogether
• RT-29717: Do not wrap notifications in ObservableList wrappers when no listeners are set
Prism
• *RT-15118*: Need to consider architectural changes for doing transforms in prism
• *RT-15839*: Complex animated content in a ScrollPane is jerky although little is seen
• *RT-17396*: Shader based 2D path rendering
• *RT-17582*: Render the scene using retained data structures
• *RT-20356*: PresentingPainter and UploadingPainter disregarding dirty clip rect
• *RT-20405*: Improve Path rendering performance
• *RT-23371*: FB: Render windows on separate hardware layers
• *RT-23450*: Improve performance of Prism rendering and clipping
• *RT-23462*: Create "CommandBuffer" for storing graphics drawing commands in Prism
• *RT-24168*: View.uploadPixels could take a source rectangle to upload only a portion of the pixels
• *RT-30271*: No culling if the only dirty region contains the clip
• *RT-30361*: Consider rendering directly to frame buffer instead of RTT
• *RT-30440*: Eliminate redundant OpenGL calls
• ***RT-30741***: Super Shader
• *RT-30746*: don't fill transparent rectangles, cache a more textures to avoid buffer flush
• *RT-30748*: Use Vertex Shader to provide clipping instead of Scissor test
• RT-5835: Fix for RT-5788 disabled an optimization for anti-aliased rectangles
• RT-6968: Prism should support 2-byte gray-alpha .png format
• RT-8722: Strokes and fills of Paths slower than flash
• RT-9682: Optimize shadow effects for rounded rectangles
• RT-10369: Optimize blurs in shaders
• RT-12400: Delays in D3D Present affect performance
• RT-14058: Consider possibility to eliminate using of BasicStroke.tmpMiter
• RT-14216: MultipleArrayGradient uses a lot of memory
• RT-14358: Insertion sort in OpenPisces ScanlineIterator may be very inefficient
• RT-14421: Branch YCbCr shader may reduce performance on slower hardware
• RT-15516: image data associated with cached nodes that are removed from a scene are not aggressively released
• RT-17507: Optimize non-uniform round rect rendering in Regions
• RT-17510: Improve performance of rendering a TRANSPARENT stage on Windows 7
• RT-17551: MacOS: Optimize using lockFocusIfCanDraw
• RT-18060: Evaluate whether enabling multithreaded GL engine on Mac benefits Mac JFX performance
• RT-18140: Consider using nearest-neighbor when smooth=false for SW pipeline to improve performance
• RT-18417: Investigate Mac runtime code for possible native code optimizations using GDC (Grand Dispatch Central)
• RT-19556: Consider removing usage of DirectByteBuffer and ByteBuffer.allocateDirect
• RT-19576: Pixel readback performance for the ES2 pipeline has room for improvement
• RT-21025: iOS: DirtyAreaTest on iOS is slower than we like
• RT-22430: Use 'fillQuad' vs. 'fillRect' for pixel aligned rectangular regions
• RT-22431: Optimize Charts drawing to use filled quads
• RT-23464: Reduce Vertex Buffer Overhead: Constant Color Attribute vs. Array Color Attributes
• RT-23465: Using TriangleStrip instead of Triangles
• RT-23466: Improve Vertex Buffer Usage: Structure of Arrays vs. Array of Structures
• RT-23471: Add new Etched effect
• RT-23574: Add support for tiled rendering of textures (both for performance and functional reasons)
• RT-23575: Need a more compact representation for text data
• RT-23576: Ability to add hand-coded shaders (bypassing JSL)
• RT-23577: Support for geometry shaders on graphics chips that support it
• RT-23581: Add ability to render 9-slice directly in Prism graphics
• RT-23725: Beagleboard: Execute fragment shader on the GPU causes significant drop in performance
• RT-23742: Gradient is slow on embedded systems
• RT-24104: Native Pisces rasterizer is slower on desktop Linux platforms
• RT-24339: Add a short-cut to dirty region code based on parent / child bounds ratio
• RT-24557: ImagePattern is slow on embedded systems
• RT-24624: prism-sw pipeline is up to 90% worse than j2d pipeline
• RT-25166: Path updates in a ScrollPane where content has a Scale transform are 100 times slower
• RT-25603: Mac optimization: Investigate layers async vs sync setting
• RT-25694: Rewrite (AA)TessShapeRep classes in order to avoid unnecessary translations
• RT-25864: New "shared textures" do not share pixel update flags as well as they should
• RT-26531: Provide independent stage performance
• RT-28222: Don't render transparent rectangles
• RT-28305: NGRegion optimizations based on Color.TRANSPARENT are ineffective
• RT-28670: Create a roundrect renderer that uses the new "texture primitive" based shaders used currently for ellipses and rects
• RT-28752: Mac: 8.0-graphics-scrum-792: up to 30% performance regression on MacOS
• RT-29542: FX 8 3D: Mesh computation code needs major clean up or redo
• RT-30360: Create fewer temporary objects in Quantum
• RT-30589: preprocess remove comments from ES2 3D shaders
• RT-30710: 8.0-graphics-scrum-1194: 20% performance regression in Bitmap benchmarks in SW pipeline
• RT-30745: Remove Flush & Finish in ES2SwapChain
• RT-30747: Introduce a low cost clipping API for simple rectangle based clipping
Media
• RT-11379: video playing with MediaPlayer slows down refreshes to Java2D component
• RT-16420: MediaPlayer/View loses frames from video streams encoded at 25,30,60 fps
• RT-17861: Use shaders to assist video decoding on the GPU
• RT-20890: Too many open files and Memory leak
Web
• RT-24320: WebView draws entire back buffer on screen upon every repaint
• RT-24998: Please enable Javascript JIT for 64 bit
• RT-16848: Optimize Unicode implementation
• RT-18909: Extend support for composite operations in Prism Graphics
• RT-19625: Better support for Webnode to improve rendering performance
• RT-20501: Prism needs to provide proper APIs to support the Webnode team to improve webnode rendering performance
• RT-21629: Slow and never-ending rendering of page
• RT-21722: html5 video inside is slow
• RT-22008: Zero size WCGraphicsPrismContext.Layer handling is not perfectly efficient
• RT-30083: netflix.com: vertical scrollbar is tremendously slow
Threading
• *RT-2893*: Enable multi-threaded processing of software-based effects when >= 2 cores available
• *RT-26702*: Poor DisplacementMap effect performance on Mac
Interop
• RT-22133: Performance: JavaFX Webview QuantumRenderer$PipelineRunnable.run() and WinApplication._runLoop() take up more than half the time in a JDeveloper operation
• RT-22567: Minor tweaks to FX/Swing painting
• RT-22705: Simple animation runs at lower FPS when embedded into JFXPanel
• RT-24278: JFXPanel with simple animation consumes entire CPU core
• RT-26993: Noticeable jerkiness when running JFXPanelBitmapBenchmark on MacOS
Benchmarks
• RT-7644: Math.floor and Math.ceil take up a lot of cpu time
Controls
• *RT-24105*: TabPane renders content of all tabs even only one is active
• *RT-30452*: Setting clip on TableCellSkinBase is incorrect
• *RT-30552*: Label: resolve LabelSkinBase's use of clips for text
• *RT-30568*: Reduce unnecessary calls to setManaged(true) in Controls
• *RT-30576*: Parent: add new public layout method, optimized to only layout this parent and it's children
• *RT-30648*: Investigate API for TabPane's Tab Content Loading policy
• RT-9094: VirtualFlow requests data from model too frequently
• RT-10034: Performance optimizations around SelectionModel implementations
• RT-13792: Investigate caching in controls (NOTE: Unlikely to be any win)
• RT-16529: Memory Leak: event handlers of root TreeItem are not removed
• RT-16853: TextArea: performance issue
• RT-18934: TextArea.appendText/deleteText may be very slow
• RT-20101: [ComboBox] Custom string converter is applied too many times
• RT-23825: Controls need a lifecycle API
• RT-24102: CSS Loading: Split caspian.css into multiple smaller component parts.
• RT-25652: Memory Leak in TabPane
• RT-25801: 8.0-controls-scrum-h81: 25% performance regression in Controls.RadioButton on mac-low end machine
• RT-26716: Performance of scrolling TreeView tail is much more slowly when scrolling TreeView head
• RT-26999: 8.0-controls-scrum-h122: up to 20% regression in some Controls.TableView benchmarks
• RT-27725: 8.0-controls-scrum-h186: 22% footprint increase in ChoiceBox control
• RT-27986: Spinning progress indicator overlapping an image plays havoc with RDP
• RT-29055: java.lang.OutOfMemoryError: Java heap space error in switching between caspian to modena theme in Modena App
• RT-30305: 8.0-controls-scrum-569: 42% performance regression in Controls.ListView-Keyboard
• RT-30713: VirtualFlow creates new cells in some instances
• RT-30824: TableView TableCell memory issue in javaFX 8.x
Embedded
• *RT-30721*: Provide flag to turn on PRESERVED mode in EGL
• *RT-30722*: Provide an option for 16-bit opaque frame buffer on the Raspberry PI
• *RT-30723*: EGL: Disable clipping when clearing frame buffer
• RT-24685: Virtual keyboard initialization is slow
• RT-24937: Use a C/C++ compiler that can take advantage of NEON
• RT-25943: Need to consider specific OpenGL extension on embedded system
• RT-25995: Prism porting layer function to query platform VRAM
• RT-27590: Evaluate effect of ProGuard on runtime size
• RT-28012: EGLFB: RAM allocation should be reduced
• RT-28029: Improve EGLFB dialog / popup response time
• RT-30719: Enabled video underlays on Raspberry PI
CSS
• *RT-28966*: CSS creates new objects for complex values which trigger redundant processing including rendering
• *RT-30381*: fx8.0-b86: CSS code for modena css rules with multiple selectors is not optimized
• RT-11506: Short circuit CSS if CSS is not relevant to the Node
• RT-11881: Some css selectors in caspian.css will turn the CSS processing on for all the parents
• RT-11882: Under current conditions, every Node is processing CSS
• RT-23468: Remove use of List in CSS internals in favor of arrays
• RT-30817: lazy deserialization of css declarations
• RT-30818: CSS: Avoid creating ObservableList for declarations and selectors in Rule
FXML
• *RT-23527*: Compile FXML to .class file
Tooling
• RT-13312: Develop GLBenchmark to get baseline performance on any particular hardware
• RT-13313: Performance framework (GPU usage)
• RT-18326: Implement performance counters (prism.printStats) feature for prism-es2 pipe
• RT-26560: Option to track texture memory allocation
• RT-30651: 8.0-graphics-scrum-1216: full speed mode seems to be broken
Startup
• RT-14930: JNLP-start consumes large amount of time
• RT-20159: Startup regression in controls scrum #371
On Jul 3, 2013, at 9:56 AM, Richard Bair <richard.bair at oracle.com> wrote:
>> Obviously there's a lot going on with the move to gradle, but we are a few lines of Gradle build code away from JFX on iOS. I'm keen to find out just how well it will run.
>
> In the runs I've seen (not on RoboVM) the main bottleneck is in graphics rendering. We don't know specifically why yet, but we have a lot of ideas. Now that Tobi reports FX + RoboVM (including fonts!) is working, I'm eager to see the performance characteristics as well.
>
> With the work you've done on the developer workflow and now that we've got an open build running on the device, we are going to need to get organized around measuring, reporting, and fixing performance issues encountered on the device. Likely some of it will be RoboVM related, but there is plenty of optimization to do in Prism as well.
>
> We've learned a lot about embedded hardware over the last year or so. Some of the things we've learned:
> - It is almost *always* fill rate limited
> - Pixel shader complexity costs you
> - CPU -> GPU bandwidth is very limited
>
> Solving the fill rate issue is huge. The Android team reckons that you can overwrite the same pixel maybe 2x before you start noticeably losing performance, 3x or more and you're dead. It doesn't even matter what it is you are doing per-pixel (could be simply filling each pixel with a solid color). The fact that you are running a pixel shader for 3x or 4x the number of pixels taxes the hardware.
>
> So for example, right now I believe we are doing 3x overdraw before we even do anything. I think first we do a clear, then we fill with black, then we fill with the Scene fill color. Then we draw whatever you give us. Obviously this is not optimal!
>
> For pixel shader complexity -- you can probably get away with more complex pixel shaders if they are only running 1x per pixel, but when they are running 3x or 4x per pixel then the complexity of the pixel shaders burns you. We did a lot of optimizations here already so hopefully we've got this one in good shape. But just something to be aware of.
>
> The CPU -> GPU bandwidth problem is one that is systemic with all these mobile devices. Higher bus speeds == less battery life, so the devices are designed with low bus speeds and this makes transfer of data between CPU and GPU costly. Games will typically do all the transfer once up front (all the graphics assets for a level are loaded up front) and then during the game they are just adjusting the viewport & vertices (often in vertex shaders so as not to pass much data down to the card), etc. Right now we are doing a tremendous amount of communication with the GPU. Ironing this out is the basis for the "super shader" (https://javafx-jira.kenai.com/browse/RT-30741).
>
> I would recommend anybody interested in performance keep the "Open Performance Issues" filter on their JIRA dashboard. There is a link to 221 performance issues (most of which are ideas about things to do to improve performance). We also need to close the loop on the other issues we were discussing about jerkiness a couple weeks ago.
>
> Richard
More information about the openjfx-dev
mailing list