java2d performance java7 / java8

Fri Oct 28 16:41:47 UTC 2016

Clarification:

"Quartz is -7% to +24% (avg. 4%) as fast as OpenGL"

should read:

"Quartz is -7% to +24% (avg. 4%) faster than OpenGL"

That is, on the 2009 Mini, the two drawing methods perform similarly
when displaying the images generated by 3D applications (bear in mind
that my application is a high-speed remote display/remote access tool.)

On 10/28/16 11:36 AM, DRC wrote:
> I've been able to optimize my code somewhat, such that it restricts
> OpenGL drawing only to the changed regions of the framebuffer.  That at
> least made OpenGL Java 2D blitting usable for my application.  However,
> the Quartz Java 2D implementation under Java 6 is still much faster in
> many cases.  I have three machines I can test against:
> 
> - a 2009 Mac Mini, Intel Core 2 Duo, nVidia GeForce 9400, Mountain Lion
> - a 2011 Macbook Pro, Intel Core i5, Intel HD Graphics 3000, Mavericks
> - a 2014 Mac Mini, Intel Core i7, Intel Iris Graphics, Yosemite
> 
> On the 2009 Mini, Quartz is 2-11x (avg. 4.7x) as fast as OpenGL on the
> eight 2D application workloads that I'm testing (these workloads draw a
> lot of very small regions of the framebuffer.)  On the twelve 3D
> application workloads (which tend to mostly draw large areas of the
> framebuffer), Quartz is -7% to +24% (avg. 4%) as fast as OpenGL.
> 
> On the 2011 MB Pro, Quartz is 4-25x (avg. 10x) as fast as OpenGL on the
> eight 2D application workloads and 1.5-2.3x (avg. 1.9x) as fast as
> OpenGL on the twelve 3D application workloads.
> 
> Here's the kicker, though-- it appears that the Quartz-accelerated Java
> 2D blitter is disabled under Yosemite and later, so on my 2014 Mini
> (which requires at least Yosemite), Java 8 (which always uses OpenGL) is
> always much faster than Java 6 (which appears to use an unaccelerated
> Java 2D blitter under OS X 10.10+.)  I verified with virtual machines
> that this phenomenon is O/S-related and not hardware-related.  It seems
> that Java 6 always disables Quartz blitting on Yosemite and later,
> regardless of the machine.
> 
> Unfortunately, because this Quartz-accelerated Java 2D blitter never
> made it into OpenJDK, because Apple discontinued Java for OS X, and
> because-- even on older hardware-- you can't use the Quartz-accelerated
> blitter on newer macOS releases, our only choice now is OpenGL.  That
> isn't always the fastest drawing method on Macs.  For instance,
> comparing the two Mac Mini models with their fastest drawing method, I
> observe that the 2009 Mini (Quartz, Java 6) is about twice as fast as
> the 2014 Mini (OpenGL, Java 8) on the 2D application workloads.  On the
> 3D application workloads, the 2014 Mini (OpenGL, Java 8) is about 40%
> faster than the 2009 Mini (Quartz, Java 6.)
> 
> In short, this is still an issue.  Under certain workloads, my modern
> machine is performing half as fast as a 2009 machine, because of the
> inability to use Quartz for blitting.
> 
> 
> On 10/28/16 4:54 AM, Tobi wrote:
>> Any news here Sergey?
>>
>>
>>
>>
>>> Am 17.02.2015 um 15:01 schrieb Sergey Bylokhov <sergey.bylokhov at oracle.com>:
>>>
>>> Hello,
>>> Thanks for the provided info! I am able to reproduce this bug even on windows: gdi vs ogl. I will take a look at it.
>>>
>>> On 12.02.2015 8:28, DRC wrote:
>>>> On 2/10/15 7:52 AM, Sergey Bylokhov wrote:
>>>>> You can run this test on jdk 8u31 and 8u40 to see a difference:
>>>>> http://cr.openjdk.java.net/~serb/8029253/webrev.04/test/java/awt/image/DrawImage/UnmanagedDrawImagePerformance.java.html 
>>>>>
>>>>> And the test from this bug report:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8017247
>>>>
>>>> After looking at those tests, they are definitely not related to the issue I'm seeing here.  Although the TurboVNC Viewer (my application) does use bilinear interpolation if desktop scaling is enabled, that is not the "common" usage case.  Normally, it's just going to be drawing a BufferedImage with no interpolation, so that at least clarifies that I shouldn't be expecting any different behavior with Java 9.  The question now becomes:  how to optimally take advantage of the OpenGL pipeline. As you pointed out (and I agree, based on my research) reducing the software-to-surface blits is key, although I don't have a firm grasp on how to do that.  My code is basically just doing the following:
>>>>
>>>>  public void paintComponent(Graphics g) {
>>>>    Graphics2D g2 = (Graphics2D) g;
>>>>    if (scaling enabled) {
>>>>      g2.setRenderingHint(RenderingHints.KEY_INTERPOLATION,
>>>> RenderingHints.VALUE_INTERPOLATION_BILINEAR);
>>>>      g2.drawImage(im.getImage(), 0, 0, scaledWidth, scaledHeight, null);
>>>>    } else {
>>>>      g2.drawImage(im.getImage(), 0, 0, null);
>>>>    }
>>>>    g2.dispose();
>>>>  }
>>>>
>>>>  public void updateWindow() {
>>>>    Rect r = damage;
>>>>    if (!r.isEmpty()) {
>>>>      if (scaling enabled) {
>>>>        blah blah blah (adjust coordinates, mainly)
>>>>        paintImmediately(x, y, width, height);
>>>>      } else {
>>>>        paintImmediately(x, y, width, height);
>>>>      }
>>>>      damage.clear();
>>>>    }
>>>>  }
>>>>
>>>> As VNC rectangles from the server are decoded, the "damage" rectangle gets updated to reflect the extent of the "damaged" pixels, and that extent is passed into paintImmediately().  In examining the OpenJDK source, however, it appears that glDrawPixels() is always called with the full extent of the BufferedImage, regardless of whether only a small portion of that image has actually changed.  If there is something else I can do to help debug this, please let me know.  I have a working JDK build.  I fully admit that I may be doing something wrong or suboptimally, but bear in mind that I've spent probably over 100 hours on this, so it's not as if I'm a naive n00b here.  If there's something I'm missing, then trust me that it isn't obvious!
>>>>
>>>>
>>>>> Can you share standalone jar file of this workload?
>>>>
>>>> Here is everything you need to reproduce the issue:
>>>> http://www.turbovnc.org/turbovnc_mac_performance_stuff.tar.gz
>>>>
>>>> Untar, then do
>>>>> cd turbovnc_mac_performance_stuff
>>>>> java -server -d64 -Dsun.java2d.trace=count -cp VncViewer.jar com.turbovnc.vncviewer.ImageDrawTest
>>>>  (let it run for 20 seconds or so, then CTRL-C it.)
>>>>> java -server -d64 -jar VncViewer.jar -bench compilation-16.rfb -benchiter 3 -benchwarmup 2
>>>>  (let it run to completion.)
>>>>
>>>>  Results from Java 6u51 on my Mac Mini (2009 vintage, 2 GHz Intel Core Duo, nVidia GeForce 9400):
>>>>  ImageDrawTest:   ~100 Mpixels/sec
>>>>    (all calls are to sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntArgbPre))
>>>>  compilation-16:  Average 1.392763 s (Decode = 0.198173 s, Blit = 1.005974 s)
>>>>
>>>>  Results from Java 8u31 on my Mac Mini:
>>>>  ImageDrawTest:   ~70 Mpixels/sec
>>>>    (Calls are split between
>>>>     sun.java2d.opengl.OGLRTTSurfaceToSurfaceBlit::Blit("OpenGL Surface (render-to-texture)", AnyAlpha, "OpenGL Surface") and
>>>>     sun.java2d.opengl.OGLSwToSurfaceBlit::Blit(IntArgbPre, AnyAlpha, "OpenGL Surface"))
>>>>  compilation-16:  Average 6.216550 s (Decode = 0.194989 s, Blit = 5.534781 s)
>>>>
>>>>  Results from Java 8u31 on my Mac Mini without alpha-enabled image (-Dturbovnc.forcealpha=false):
>>>>  ImageDrawTest:   ~18 Mpixels/sec
>>>>    (Calls are split between:
>>>>     sun.java2d.opengl.OGLRTTSurfaceToSurfaceBlit::Blit("OpenGL Surface (render-to-texture)", AnyAlpha, "OpenGL Surface") and
>>>>     sun.java2d.opengl.OGLSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "OpenGL Surface"))
>>>>  compilation-16:  Average 27.153480 s (Decode = 0.200333 s, Blit = 26.523137 s)
>>>>
>>>> So, as you can see, using an alpha-enabled image improved the performance under Java 7/8 by about 4x, both when drawing large images (ImageDrawTest) and when doing smaller image updates (compilation-16.) However, the blitting performance under Java 7/8 for small image workloads is still about 5x slower than it was under Java 6.  Results from a different machine:
>>>>
>>>>  Results from Java 6u51 on my Macbook Pro (2011 vintage, 2.4 GHz Intel Core i5, Intel HD Graphics 3000):
>>>>  ImageDrawTest:   ~100 Mpixels/sec
>>>>    (all calls to sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntArgbPre))
>>>>  compilation-16:  Average 0.592772 s (Decode = 0.113879 s, Blit = 0.351596 s)
>>>>
>>>>  Results from Java 8u31 on my Macbook Pro:
>>>>  ImageDrawTest:   ~66 Mpixels/sec
>>>>    (Calls split between
>>>>     sun.java2d.opengl.OGLRTTSurfaceToSurfaceBlit::Blit("OpenGL Surface (render-to-texture)", AnyAlpha, "OpenGL Surface") and
>>>>     sun.java2d.opengl.OGLSwToSurfaceBlit::Blit(IntArgbPre, AnyAlpha, "OpenGL Surface"))
>>>>  compilation-16:  Average 6.806324 s (Decode = 0.188252 s, Blit = 6.457852 s)
>>>>
>>>>  Results from Java 8u31 on my Macbook Pro without alpha-enabled image (-Dturbovnc.forcealpha=false):
>>>>  ImageDrawTest:   ~50 Mpixels/sec
>>>>    (Calls split between
>>>>     sun.java2d.opengl.OGLRTTSurfaceToSurfaceBlit::Blit("OpenGL Surface (render-to-texture)", AnyAlpha, "OpenGL Surface") and
>>>>     sun.java2d.opengl.OGLSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "OpenGL Surface"))
>>>>  compilation-16:  Average 10.272508 s (Decode = 0.147805 s, Blit = 9.955666 s)
>>>>
>>>> Using an ARGB_PRE BufferedImage didn't help out nearly as much on this machine, and whereas the large image performance looks similar to that of the Mac Mini, the small image blitting performance still suffers by nearly a factor of 20 (although it is improved-- before the use of ARGB_PRE images, it was about a factor of 30 slower.)
>>>>
>>>> The architecture of this solution makes the use of VolatileImages impractical-- basically, I have to decode the VNC rectangles in real time as they arrive, so if the VolatileImage were to go away, I would have no way of rebuilding it.
>>>
>>>
>>> -- 
>>> Best regards, Sergey.
>>>
>>