[OpenJDK 2D-Dev] [9] request for review: 8087201: OGL: rendering of lcd text is slow

Phil Race philip.race at oracle.com
Wed Jun 24 18:40:37 UTC 2015


On 6/24/15 8:18 AM, Andrew Brygin wrote:
> Hello Phil,
>
>  please see my comments inline.
>
> 23/06/15 21:29, Phil Race wrote:
>> Hi Andrew,
>>
>> Overall the fix looks good. A few questions.
>>
>> 1. Regarding translucent surfaces, do you know when Swing
>> has a translucent backbuffer and when it does not ?
>> It has been noted that we now have LCD text in some cases
>> in SS2 but apparently still not in NB ..
> I did not noticed the lcd text in SwingSet2 demo without explicit
> switch to opaque backbuffers in the ReapaintManager.
>
> My expectation is that standard swing components should not
> use lcd text on macosx at the moment. However, if there are
> (custom?) components which create an opaque buffers separately
> from the ReapaintManager, then they could be able to use lcd text.

See https://bugs.openjdk.java.net/browse/JDK-8098853 and/or ask Yuri.


>>
>> 2. Where are we likely to find (or not find) support for this 
>> extension ?
>>
>> Based on your results ironically, it seems that the Nvidia card is the
>> one case that did not support the extension. Is that because it was
>> an older version of OS X than the others ?
>
> Unfortunately, the extension is relatively new, and we need a new drivers
> to use this extension. The mbp with nvidia GF9600M is running under 
> OSX 10.8,
> and there we can not use the extension. However, this extension is listed
> as supported for the GF9600M in the extension database, and we 
> probably can
> expect that an upgrade of OSX to 10.9 or 10.10 will make it available.

OK.
>
> The availability of the extension is a main reason to look for an 
> alternative
> solutions. A best option is to identify and eliminate a reason of the
> glCopyTexSubimage() slowness. There are some reasons to think that this
> is possible:
>  * a separate simple OGL demo shows almost equal performance for
>    glCopyTexSubImage() and re-using the FBO texture.
>  * on windows, the performance of glCopyTexSubImage is much better
>     in the case of FBO.
>
> However, at the moment I do not see what we are doing wrong/non-optimal
> with the standard approach.

By standard approach you mean what exactly ?

>>
>> 3. The performance 'lost' case.
>> >   However, on systems where the fast path with destination texture 
>> is not
>> >  possible for any reasons, this change may cause a performance 
>> degradation
>> >  because of more extenceive usage of glCopyTexSubImage2D.
>> > So, we probably may want to get a means to configure the cell 
>> dimension
>>
>> Is this a reference to losing performance on non-retina displays
>> where we would be better off with the smaller cache cell size ?

Was the answer to this 'yes' ?
>>
>> I suppose the importance of this depends in part on the answer to 
>> question #2
> Probably, most important part here is old OSX (< 10.9) systems.

All the 8 updates support 10.8.3+ so I suppose that is the main case but
I expect that to 'go away' for JDK 9 or perhaps earlier once Apple stop 
supporting it.

> Also, windows systems with OGL drivers created before 2011 - 2012.
> However, OGL is a optional pipeline in windows, so it could be less 
> critical.

I think older windows drivers are something we would encourage everyone
to get off ASAP anyway ..
>>
>> 4. Have you tried this on Linux .. or even a Windows OGL driver ?
> I have uploaded results for a linux system with NVS5400:
> http://cr.openjdk.java.net/~bae/8087201/9/linux-x64-bench.txt
>
> Here we have the NV_texture_barrier extension, and benefit up to x10-x20
> speedup in some testes.
>
> On windows, I have got mixed results:
> * Intel HD4000: no extension due to old drivers, so the same results 
> as without the fix.
> * NVS5400: with the fix we have got similar scores in the tests as on 
> macosx,
>      but standard way with glCopyTexSubImage gives better results anyway.
>      I.e. with the fix, we achieve only 55% - 60% of original 
> performance.

That is very interesting. Is that related to your earlier observation :
 > * on windows, the performance of glCopyTexSubImage is much better in 
the case of FBO.

Any idea why ? Given that it is a unified driver it sounds like we may 
be want
to disable this code path when on windows at least for NV but I guess we
may also want to validate that on some other cards - from Nvidia - to
see if it is a driver or h/w limitation.

-phil.
>
> Thanks,
> Andrew
>
>>
>> -phil.
>>
>> On 06/18/2015 07:40 AM, Andrew Brygin wrote:
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8087201
>>> Webrev: http://cr.openjdk.java.net/~bae/8087201/9/webrev.00/
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>> 18/06/15 17:39, Andrew Brygin пишет:
>>>> Hello,
>>>>
>>>>  could you please review a fix for 8087201?
>>>>
>>>>  The root of the problem is that we have to supply a content of
>>>>  destination surface to lcd shader to compose the lcd glyph correctly.
>>>>  In order to do this, we have to copy a sub-image from destination
>>>>  buffer to an intermediate texture using glCopyTexSubImage2D() 
>>>> routine.
>>>>  Unfortunately, this routine is quite slow on majority of systems, 
>>>> and it
>>>>  dramatically reduces the overall speed of lcd text rendering.
>>>>
>>>>  The main idea of the fix is to use a texture associated with the 
>>>> destination
>>>>  surface if it exists. In this case we have a chance to completely 
>>>> abandon the
>>>>  data copying. However, we have to avoid read-after-write in order 
>>>> to get
>>>>  correct results in this case. Fortunately, it can be achieved by 
>>>> using the
>>>>  GL_NV_texture_barrier extension:
>>>>
>>>> https://www.opengl.org/registry/specs/NV/texture_barrier.txt
>>>>
>>>> Beside this, suggested fix introduces following changes in OGL text 
>>>> renderer:
>>>>
>>>> * Separate accelerated caches for LCD and AA glyphs
>>>>    We have a single cache which is initialized ether for LCD or for 
>>>> AA glyphs.
>>>>    If application mixes these types of font smoothing from some 
>>>> reasons, we
>>>>    have got a significant performance degradation.
>>>>    For example, if we use J2DBench in GUI mode, then swing GUI 
>>>> initializes the
>>>>    accelerated cache for AA,  and subsequent rendering of LCD text 
>>>> always
>>>>    uses 'no-cache' code path.
>>>>
>>>> * Increase dimension of the glyph cache cell from 16x16 to 32x32.
>>>>    This change gives significant performance boost on systems with 
>>>> retina
>>>>   (because of average size of rendered glyphs).
>>>>    However, on systems where the fast path with destination texture 
>>>> is not
>>>>    possible for any reasons, this change may cause a performance 
>>>> degradation
>>>>    because of more extenceive usage of glCopyTexSubImage2D.
>>>>   So, we probably may want to get a means to configure the cell 
>>>> dimension
>>>>   depending on system capabilities.
>>>>
>>>> Performance results overview:
>>>> * MBP with Intel Iris (retina, texture barrier is available):
>>>> http://cr.openjdk.java.net/~bae/8087201/9/mbp-intel-iris.txt
>>>>
>>>> * iMac with AMD HD6750M (no retina, texture barrier is available):
>>>> http://cr.openjdk.java.net/~bae/8087201/9/imac-amd-hd6750m.txt
>>>>
>>>> * MBP with OSX10.8, NV GF9600M (no retina, no texture barrier):
>>>> http://cr.openjdk.java.net/~bae/8087201/9/mbp-10.8-NVGF9600M.txt
>>>>
>>>> Please take a look.
>>>>
>>>> Thanks,
>>>> Andrew
>>>
>>
>




More information about the 2d-dev mailing list