[OpenJDK 2D-Dev] [9] request for review: 8087201: OGL: rendering of lcd text is slow
Andrew Brygin
andrew.brygin at oracle.com
Thu Jun 25 10:33:15 UTC 2015
24/06/15 21:40, Phil Race wrote:
> On 6/24/15 8:18 AM, Andrew Brygin wrote:
>> Hello Phil,
>>
>> please see my comments inline.
>>
>> 23/06/15 21:29, Phil Race wrote:
>>> Hi Andrew,
>>>
>>> Overall the fix looks good. A few questions.
>>>
>>> 1. Regarding translucent surfaces, do you know when Swing
>>> has a translucent backbuffer and when it does not ?
>>> It has been noted that we now have LCD text in some cases
>>> in SS2 but apparently still not in NB ..
>> I did not noticed the lcd text in SwingSet2 demo without explicit
>> switch to opaque backbuffers in the ReapaintManager.
>>
>> My expectation is that standard swing components should not
>> use lcd text on macosx at the moment. However, if there are
>> (custom?) components which create an opaque buffers separately
>> from the ReapaintManager, then they could be able to use lcd text.
>
> See https://bugs.openjdk.java.net/browse/JDK-8098853 and/or ask Yuri.
>
>
>>>
>>> 2. Where are we likely to find (or not find) support for this
>>> extension ?
>>>
>>> Based on your results ironically, it seems that the Nvidia card is the
>>> one case that did not support the extension. Is that because it was
>>> an older version of OS X than the others ?
>>
>> Unfortunately, the extension is relatively new, and we need a new
>> drivers
>> to use this extension. The mbp with nvidia GF9600M is running under
>> OSX 10.8,
>> and there we can not use the extension. However, this extension is
>> listed
>> as supported for the GF9600M in the extension database, and we
>> probably can
>> expect that an upgrade of OSX to 10.9 or 10.10 will make it available.
>
> OK.
>>
>> The availability of the extension is a main reason to look for an
>> alternative
>> solutions. A best option is to identify and eliminate a reason of the
>> glCopyTexSubimage() slowness. There are some reasons to think that this
>> is possible:
>> * a separate simple OGL demo shows almost equal performance for
>> glCopyTexSubImage() and re-using the FBO texture.
>> * on windows, the performance of glCopyTexSubImage is much better
>> in the case of FBO.
>>
>> However, at the moment I do not see what we are doing wrong/non-optimal
>> with the standard approach.
>
> By standard approach you mean what exactly ?
>
getting the destination content by using the glCopyTexSubImage().
>>>
>>> 3. The performance 'lost' case.
>>> > However, on systems where the fast path with destination texture
>>> is not
>>> > possible for any reasons, this change may cause a performance
>>> degradation
>>> > because of more extenceive usage of glCopyTexSubImage2D.
>>> > So, we probably may want to get a means to configure the cell
>>> dimension
>>>
>>> Is this a reference to losing performance on non-retina displays
>>> where we would be better off with the smaller cache cell size ?
>
> Was the answer to this 'yes' ?
yes, this is on example. Any case, where the destination surface data
does not have
an underlying texture, will suffer from excessive reading with bigger
cell size.
The screen surface data on Windows can illustrate this.
>>>
>>> I suppose the importance of this depends in part on the answer to
>>> question #2
>> Probably, most important part here is old OSX (< 10.9) systems.
>
> All the 8 updates support 10.8.3+ so I suppose that is the main case but
> I expect that to 'go away' for JDK 9 or perhaps earlier once Apple
> stop supporting it.
>
>> Also, windows systems with OGL drivers created before 2011 - 2012.
>> However, OGL is a optional pipeline in windows, so it could be less
>> critical.
>
> I think older windows drivers are something we would encourage everyone
> to get off ASAP anyway ..
>>>
>>> 4. Have you tried this on Linux .. or even a Windows OGL driver ?
>> I have uploaded results for a linux system with NVS5400:
>> http://cr.openjdk.java.net/~bae/8087201/9/linux-x64-bench.txt
>>
>> Here we have the NV_texture_barrier extension, and benefit up to x10-x20
>> speedup in some testes.
>>
>> On windows, I have got mixed results:
>> * Intel HD4000: no extension due to old drivers, so the same results
>> as without the fix.
>> * NVS5400: with the fix we have got similar scores in the tests as on
>> macosx,
>> but standard way with glCopyTexSubImage gives better results
>> anyway.
>> I.e. with the fix, we achieve only 55% - 60% of original
>> performance.
>
> That is very interesting. Is that related to your earlier observation :
> > * on windows, the performance of glCopyTexSubImage is much better in
> the case of FBO.
yes, exactly.
>
> Any idea why ?
Unfortunately, at the moment I do not see a reason for this.
I have played with an idea to variate our settings for pixel store/pixel
transfer
in order to achieve better performance with glCopyTexSubImage(), but do not
see any benefits at the moment.
Another thing I am going to try is to postpone the shader program activation
in order to read the destination without active shader. However, this is
also
a blind shoot...
> Given that it is a unified driver it sounds like we may be want
> to disable this code path when on windows at least for NV but I guess we
> may also want to validate that on some other cards - from Nvidia - to
> see if it is a driver or h/w limitation.
Probably, we should to run the text benchmarks on relatively big set of
windows
machines, and if we see that good performance of glCopyTexSubImage() is
sooner
a rule than an exception, then we can just disable the new code path on
windows.
Wat do you think?
Thanks,
Andrew
> -phil.
>>
>> Thanks,
>> Andrew
>>
>>>
>>> -phil.
>>>
>>> On 06/18/2015 07:40 AM, Andrew Brygin wrote:
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8087201
>>>> Webrev: http://cr.openjdk.java.net/~bae/8087201/9/webrev.00/
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>>
>>>> 18/06/15 17:39, Andrew Brygin пишет:
>>>>> Hello,
>>>>>
>>>>> could you please review a fix for 8087201?
>>>>>
>>>>> The root of the problem is that we have to supply a content of
>>>>> destination surface to lcd shader to compose the lcd glyph
>>>>> correctly.
>>>>> In order to do this, we have to copy a sub-image from destination
>>>>> buffer to an intermediate texture using glCopyTexSubImage2D()
>>>>> routine.
>>>>> Unfortunately, this routine is quite slow on majority of systems,
>>>>> and it
>>>>> dramatically reduces the overall speed of lcd text rendering.
>>>>>
>>>>> The main idea of the fix is to use a texture associated with the
>>>>> destination
>>>>> surface if it exists. In this case we have a chance to completely
>>>>> abandon the
>>>>> data copying. However, we have to avoid read-after-write in order
>>>>> to get
>>>>> correct results in this case. Fortunately, it can be achieved by
>>>>> using the
>>>>> GL_NV_texture_barrier extension:
>>>>>
>>>>> https://www.opengl.org/registry/specs/NV/texture_barrier.txt
>>>>>
>>>>> Beside this, suggested fix introduces following changes in OGL
>>>>> text renderer:
>>>>>
>>>>> * Separate accelerated caches for LCD and AA glyphs
>>>>> We have a single cache which is initialized ether for LCD or
>>>>> for AA glyphs.
>>>>> If application mixes these types of font smoothing from some
>>>>> reasons, we
>>>>> have got a significant performance degradation.
>>>>> For example, if we use J2DBench in GUI mode, then swing GUI
>>>>> initializes the
>>>>> accelerated cache for AA, and subsequent rendering of LCD text
>>>>> always
>>>>> uses 'no-cache' code path.
>>>>>
>>>>> * Increase dimension of the glyph cache cell from 16x16 to 32x32.
>>>>> This change gives significant performance boost on systems with
>>>>> retina
>>>>> (because of average size of rendered glyphs).
>>>>> However, on systems where the fast path with destination
>>>>> texture is not
>>>>> possible for any reasons, this change may cause a performance
>>>>> degradation
>>>>> because of more extenceive usage of glCopyTexSubImage2D.
>>>>> So, we probably may want to get a means to configure the cell
>>>>> dimension
>>>>> depending on system capabilities.
>>>>>
>>>>> Performance results overview:
>>>>> * MBP with Intel Iris (retina, texture barrier is available):
>>>>> http://cr.openjdk.java.net/~bae/8087201/9/mbp-intel-iris.txt
>>>>>
>>>>> * iMac with AMD HD6750M (no retina, texture barrier is available):
>>>>> http://cr.openjdk.java.net/~bae/8087201/9/imac-amd-hd6750m.txt
>>>>>
>>>>> * MBP with OSX10.8, NV GF9600M (no retina, no texture barrier):
>>>>> http://cr.openjdk.java.net/~bae/8087201/9/mbp-10.8-NVGF9600M.txt
>>>>>
>>>>> Please take a look.
>>>>>
>>>>> Thanks,
>>>>> Andrew
>>>>
>>>
>>
>
More information about the 2d-dev
mailing list