[OpenJDK 2D-Dev] [9] request for review: 8087201: OGL: rendering of lcd text is slow
Phil Race
philip.race at oracle.com
Wed Jun 24 18:40:37 UTC 2015
On 6/24/15 8:18 AM, Andrew Brygin wrote:
> Hello Phil,
>
> please see my comments inline.
>
> 23/06/15 21:29, Phil Race wrote:
>> Hi Andrew,
>>
>> Overall the fix looks good. A few questions.
>>
>> 1. Regarding translucent surfaces, do you know when Swing
>> has a translucent backbuffer and when it does not ?
>> It has been noted that we now have LCD text in some cases
>> in SS2 but apparently still not in NB ..
> I did not noticed the lcd text in SwingSet2 demo without explicit
> switch to opaque backbuffers in the ReapaintManager.
>
> My expectation is that standard swing components should not
> use lcd text on macosx at the moment. However, if there are
> (custom?) components which create an opaque buffers separately
> from the ReapaintManager, then they could be able to use lcd text.
See https://bugs.openjdk.java.net/browse/JDK-8098853 and/or ask Yuri.
>>
>> 2. Where are we likely to find (or not find) support for this
>> extension ?
>>
>> Based on your results ironically, it seems that the Nvidia card is the
>> one case that did not support the extension. Is that because it was
>> an older version of OS X than the others ?
>
> Unfortunately, the extension is relatively new, and we need a new drivers
> to use this extension. The mbp with nvidia GF9600M is running under
> OSX 10.8,
> and there we can not use the extension. However, this extension is listed
> as supported for the GF9600M in the extension database, and we
> probably can
> expect that an upgrade of OSX to 10.9 or 10.10 will make it available.
OK.
>
> The availability of the extension is a main reason to look for an
> alternative
> solutions. A best option is to identify and eliminate a reason of the
> glCopyTexSubimage() slowness. There are some reasons to think that this
> is possible:
> * a separate simple OGL demo shows almost equal performance for
> glCopyTexSubImage() and re-using the FBO texture.
> * on windows, the performance of glCopyTexSubImage is much better
> in the case of FBO.
>
> However, at the moment I do not see what we are doing wrong/non-optimal
> with the standard approach.
By standard approach you mean what exactly ?
>>
>> 3. The performance 'lost' case.
>> > However, on systems where the fast path with destination texture
>> is not
>> > possible for any reasons, this change may cause a performance
>> degradation
>> > because of more extenceive usage of glCopyTexSubImage2D.
>> > So, we probably may want to get a means to configure the cell
>> dimension
>>
>> Is this a reference to losing performance on non-retina displays
>> where we would be better off with the smaller cache cell size ?
Was the answer to this 'yes' ?
>>
>> I suppose the importance of this depends in part on the answer to
>> question #2
> Probably, most important part here is old OSX (< 10.9) systems.
All the 8 updates support 10.8.3+ so I suppose that is the main case but
I expect that to 'go away' for JDK 9 or perhaps earlier once Apple stop
supporting it.
> Also, windows systems with OGL drivers created before 2011 - 2012.
> However, OGL is a optional pipeline in windows, so it could be less
> critical.
I think older windows drivers are something we would encourage everyone
to get off ASAP anyway ..
>>
>> 4. Have you tried this on Linux .. or even a Windows OGL driver ?
> I have uploaded results for a linux system with NVS5400:
> http://cr.openjdk.java.net/~bae/8087201/9/linux-x64-bench.txt
>
> Here we have the NV_texture_barrier extension, and benefit up to x10-x20
> speedup in some testes.
>
> On windows, I have got mixed results:
> * Intel HD4000: no extension due to old drivers, so the same results
> as without the fix.
> * NVS5400: with the fix we have got similar scores in the tests as on
> macosx,
> but standard way with glCopyTexSubImage gives better results anyway.
> I.e. with the fix, we achieve only 55% - 60% of original
> performance.
That is very interesting. Is that related to your earlier observation :
> * on windows, the performance of glCopyTexSubImage is much better in
the case of FBO.
Any idea why ? Given that it is a unified driver it sounds like we may
be want
to disable this code path when on windows at least for NV but I guess we
may also want to validate that on some other cards - from Nvidia - to
see if it is a driver or h/w limitation.
-phil.
>
> Thanks,
> Andrew
>
>>
>> -phil.
>>
>> On 06/18/2015 07:40 AM, Andrew Brygin wrote:
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8087201
>>> Webrev: http://cr.openjdk.java.net/~bae/8087201/9/webrev.00/
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>> 18/06/15 17:39, Andrew Brygin пишет:
>>>> Hello,
>>>>
>>>> could you please review a fix for 8087201?
>>>>
>>>> The root of the problem is that we have to supply a content of
>>>> destination surface to lcd shader to compose the lcd glyph correctly.
>>>> In order to do this, we have to copy a sub-image from destination
>>>> buffer to an intermediate texture using glCopyTexSubImage2D()
>>>> routine.
>>>> Unfortunately, this routine is quite slow on majority of systems,
>>>> and it
>>>> dramatically reduces the overall speed of lcd text rendering.
>>>>
>>>> The main idea of the fix is to use a texture associated with the
>>>> destination
>>>> surface if it exists. In this case we have a chance to completely
>>>> abandon the
>>>> data copying. However, we have to avoid read-after-write in order
>>>> to get
>>>> correct results in this case. Fortunately, it can be achieved by
>>>> using the
>>>> GL_NV_texture_barrier extension:
>>>>
>>>> https://www.opengl.org/registry/specs/NV/texture_barrier.txt
>>>>
>>>> Beside this, suggested fix introduces following changes in OGL text
>>>> renderer:
>>>>
>>>> * Separate accelerated caches for LCD and AA glyphs
>>>> We have a single cache which is initialized ether for LCD or for
>>>> AA glyphs.
>>>> If application mixes these types of font smoothing from some
>>>> reasons, we
>>>> have got a significant performance degradation.
>>>> For example, if we use J2DBench in GUI mode, then swing GUI
>>>> initializes the
>>>> accelerated cache for AA, and subsequent rendering of LCD text
>>>> always
>>>> uses 'no-cache' code path.
>>>>
>>>> * Increase dimension of the glyph cache cell from 16x16 to 32x32.
>>>> This change gives significant performance boost on systems with
>>>> retina
>>>> (because of average size of rendered glyphs).
>>>> However, on systems where the fast path with destination texture
>>>> is not
>>>> possible for any reasons, this change may cause a performance
>>>> degradation
>>>> because of more extenceive usage of glCopyTexSubImage2D.
>>>> So, we probably may want to get a means to configure the cell
>>>> dimension
>>>> depending on system capabilities.
>>>>
>>>> Performance results overview:
>>>> * MBP with Intel Iris (retina, texture barrier is available):
>>>> http://cr.openjdk.java.net/~bae/8087201/9/mbp-intel-iris.txt
>>>>
>>>> * iMac with AMD HD6750M (no retina, texture barrier is available):
>>>> http://cr.openjdk.java.net/~bae/8087201/9/imac-amd-hd6750m.txt
>>>>
>>>> * MBP with OSX10.8, NV GF9600M (no retina, no texture barrier):
>>>> http://cr.openjdk.java.net/~bae/8087201/9/mbp-10.8-NVGF9600M.txt
>>>>
>>>> Please take a look.
>>>>
>>>> Thanks,
>>>> Andrew
>>>
>>
>
More information about the 2d-dev
mailing list