[OpenJDK 2D-Dev] [9] request for review: 8087201: OGL: rendering of lcd text is slow
Sergey Bylokhov
Sergey.Bylokhov at oracle.com
Wed Jun 24 19:45:43 UTC 2015
Hi, Andrew.
Thanks for this report. As far as I understand it in case of retina the
lcd text is drawing faster after the fix than aa before the fix, which
means that we will not get a new regressions. So the fix looks fine.
But on non retina our results still not so good, lcd text is slow:
485(was 16.4) vs 16508..... and the window for optimizations still exists.
global.dest=VolatileImg(Opaque),text.opts.font.fsize=6.0,text.opts.graphics.textaa=LCD_HRGB:
9-8087201-v00: 485.2052560 (var=0.57%) (2955.82%)
**|*********************************************************
**|*********************************************************
**|*********************************************************
global.dest=VolatileImg(Opaque),text.opts.font.fsize=6.0,text.opts.graphics.textaa=On:
9-8087201-v00: 16508.76580 (var=0.66%) (99.69%)
************************************************************|
************************************************************|
*********************************************************** |
On 19.06.15 15:54, Andrew Brygin wrote:
> Hello Sergey,
>
> the only part of the fix affects the performance of AA case: the
> cache cell size.
> In a case of retina, 13pt and 20pt glyphs do not fit the 16x16 cache
> cells,
> so these benchmarks show better performance:
> 13pt: 40-80 times faster
> 20pt: 7-13 times faster
>
> 6pt shows the same results, because it fits the cache in any case.
>
> Full benchmark results:
> http://cr.openjdk.java.net/~bae/8087201/9/ogl-lcd-aa.res
>
> Regarding the suggestion with creating a separate method for the fast
> path possibility check: please note that we do this check and calculate
> the dstTextureID only once per whole glyph vector, but use the
> dstTextureID
> as an indicator for every glyph. So such change will affect
> performance for
> sure.
> Probably we can masquerade the 'dstTextureID == 0' condition with some
> sort of a macro, like canReadDestinationDirectly() or something like
> this.
> Are you OK with this?
>
> Thanks,
> Andrew
>
> 19/06/15 13:57, Sergey Bylokhov wrote:
>> Hi, Andrew.
>> Can you additionally provide the bench data about aa(before/after the
>> fix) vs new lcd lcd?
>>
>> Probably it well be more obvious if the code in OGLTextRenderer
>> 1007 if (OGLC_IS_CAP_PRESENT(oglc, CAPS_EXT_TEXBARRIER) &&
>> 1008 dstOps->textureTarget == GL_TEXTURE_2D)
>>
>> Will be moved to the separate method and the check to the possibility
>> of fast blit will be clarified instead of:
>> if (dstTextureID == 0) {
>>
>> Also your review request contains useful information like
>> fast/slow/read-after-write etc. I think this information can be
>> useful as a comments in the code.
>>
>> On 18.06.15 17:39, Andrew Brygin wrote:
>>> Hello,
>>>
>>> could you please review a fix for 8087201?
>>>
>>> The root of the problem is that we have to supply a content of
>>> destination surface to lcd shader to compose the lcd glyph correctly.
>>> In order to do this, we have to copy a sub-image from destination
>>> buffer to an intermediate texture using glCopyTexSubImage2D() routine.
>>> Unfortunately, this routine is quite slow on majority of systems,
>>> and it
>>> dramatically reduces the overall speed of lcd text rendering.
>>>
>>> The main idea of the fix is to use a texture associated with the
>>> destination
>>> surface if it exists. In this case we have a chance to completely
>>> abandon the
>>> data copying. However, we have to avoid read-after-write in order
>>> to get
>>> correct results in this case. Fortunately, it can be achieved by
>>> using the
>>> GL_NV_texture_barrier extension:
>>>
>>> https://www.opengl.org/registry/specs/NV/texture_barrier.txt
>>>
>>> Beside this, suggested fix introduces following changes in OGL text
>>> renderer:
>>>
>>> * Separate accelerated caches for LCD and AA glyphs
>>> We have a single cache which is initialized ether for LCD or for
>>> AA glyphs.
>>> If application mixes these types of font smoothing from some
>>> reasons, we
>>> have got a significant performance degradation.
>>> For example, if we use J2DBench in GUI mode, then swing GUI
>>> initializes the
>>> accelerated cache for AA, and subsequent rendering of LCD text
>>> always
>>> uses 'no-cache' code path.
>>>
>>> * Increase dimension of the glyph cache cell from 16x16 to 32x32.
>>> This change gives significant performance boost on systems with
>>> retina
>>> (because of average size of rendered glyphs).
>>> However, on systems where the fast path with destination texture
>>> is not
>>> possible for any reasons, this change may cause a performance
>>> degradation
>>> because of more extenceive usage of glCopyTexSubImage2D.
>>> So, we probably may want to get a means to configure the cell
>>> dimension
>>> depending on system capabilities.
>>>
>>> Performance results overview:
>>> * MBP with Intel Iris (retina, texture barrier is available):
>>> http://cr.openjdk.java.net/~bae/8087201/9/mbp-intel-iris.txt
>>>
>>> * iMac with AMD HD6750M (no retina, texture barrier is available):
>>> http://cr.openjdk.java.net/~bae/8087201/9/imac-amd-hd6750m.txt
>>>
>>> * MBP with OSX10.8, NV GF9600M (no retina, no texture barrier):
>>> http://cr.openjdk.java.net/~bae/8087201/9/mbp-10.8-NVGF9600M.txt
>>>
>>> Please take a look.
>>>
>>> Thanks,
>>> Andrew
>>
>>
>
--
Best regards, Sergey.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20150624/c5a8d033/attachment.html>
More information about the 2d-dev
mailing list