[OpenJDK 2D-Dev] [9] request for review: 8087201: OGL: rendering of lcd text is slow

Wed Jun 24 19:45:43 UTC 2015

Hi, Andrew.
Thanks for this report. As far as I understand it in case of retina the 
lcd text is drawing faster after the fix than aa before the fix, which 
means that we will not get a new regressions. So the fix looks fine.

But on non retina our results still not so good, lcd text is slow: 
485(was 16.4) vs 16508..... and the window for optimizations still exists.

global.dest=VolatileImg(Opaque),text.opts.font.fsize=6.0,text.opts.graphics.textaa=LCD_HRGB:
9-8087201-v00: 485.2052560 (var=0.57%) (2955.82%)
**|*********************************************************
**|*********************************************************
**|*********************************************************
global.dest=VolatileImg(Opaque),text.opts.font.fsize=6.0,text.opts.graphics.textaa=On:
9-8087201-v00: 16508.76580 (var=0.66%) (99.69%)
************************************************************|
************************************************************|
*********************************************************** |

On 19.06.15 15:54, Andrew Brygin wrote:
> Hello Sergey,
>
>  the only part of the fix affects the performance of AA case: the 
> cache cell size.
>  In a case of retina, 13pt and 20pt glyphs do not fit the 16x16 cache 
> cells,
>  so these benchmarks show better performance:
>  13pt: 40-80 times faster
>  20pt: 7-13 times faster
>
>  6pt shows the same results, because it fits the cache in any case.
>
>  Full benchmark results:
>  http://cr.openjdk.java.net/~bae/8087201/9/ogl-lcd-aa.res
>
>  Regarding the suggestion with creating a separate method for the fast
>  path possibility check: please note that we do this check and calculate
>  the dstTextureID only once per whole glyph vector, but use the 
> dstTextureID
>  as an indicator for every glyph. So such change will affect 
> performance for
>  sure.
>  Probably we can masquerade  the 'dstTextureID == 0' condition with some
>  sort of a macro, like canReadDestinationDirectly() or something like 
> this.
>  Are you OK with this?
>
> Thanks,
> Andrew
>
> 19/06/15 13:57, Sergey Bylokhov wrote:
>> Hi, Andrew.
>> Can you additionally provide the bench data about aa(before/after the 
>> fix) vs new lcd lcd?
>>
>> Probably it well be more obvious if the code in OGLTextRenderer
>> 1007     if (OGLC_IS_CAP_PRESENT(oglc, CAPS_EXT_TEXBARRIER) &&
>> 1008         dstOps->textureTarget == GL_TEXTURE_2D)
>>
>> Will be moved to the separate method and the check to the possibility 
>> of fast blit will be clarified instead of:
>> if (dstTextureID == 0) {
>>
>> Also your review request contains useful information like 
>> fast/slow/read-after-write etc. I think this information can be 
>> useful as a comments in the code.
>>
>> On 18.06.15 17:39, Andrew Brygin wrote:
>>> Hello,
>>>
>>>  could you please review a fix for 8087201?
>>>
>>>  The root of the problem is that we have to supply a content of
>>>  destination surface to lcd shader to compose the lcd glyph correctly.
>>>  In order to do this, we have to copy a sub-image from destination
>>>  buffer to an intermediate texture using glCopyTexSubImage2D() routine.
>>>  Unfortunately, this routine is quite slow on majority of systems, 
>>> and it
>>>  dramatically reduces the overall speed of lcd text rendering.
>>>
>>>  The main idea of the fix is to use a texture associated with the 
>>> destination
>>>  surface if it exists. In this case we have a chance to completely 
>>> abandon the
>>>  data copying. However, we have to avoid read-after-write in order 
>>> to get
>>>  correct results in this case. Fortunately, it can be achieved by 
>>> using the
>>>  GL_NV_texture_barrier extension:
>>>
>>> https://www.opengl.org/registry/specs/NV/texture_barrier.txt
>>>
>>> Beside this, suggested fix introduces following changes in OGL text 
>>> renderer:
>>>
>>> * Separate accelerated caches for LCD and AA glyphs
>>>    We have a single cache which is initialized ether for LCD or for 
>>> AA glyphs.
>>>    If application mixes these types of font smoothing from some 
>>> reasons, we
>>>    have got a significant performance degradation.
>>>    For example, if we use J2DBench in GUI mode, then swing GUI 
>>> initializes the
>>>    accelerated cache for AA,  and subsequent rendering of LCD text 
>>> always
>>>    uses 'no-cache' code path.
>>>
>>> * Increase dimension of the glyph cache cell from 16x16 to 32x32.
>>>    This change gives significant performance boost on systems with 
>>> retina
>>>   (because of average size of rendered glyphs).
>>>    However, on systems where the fast path with destination texture 
>>> is not
>>>    possible for any reasons, this change may cause a performance 
>>> degradation
>>>    because of more extenceive usage of glCopyTexSubImage2D.
>>>   So, we probably may want to get a means to configure the cell 
>>> dimension
>>>   depending on system capabilities.
>>>
>>> Performance results overview:
>>> * MBP with Intel Iris (retina, texture barrier is available):
>>>   http://cr.openjdk.java.net/~bae/8087201/9/mbp-intel-iris.txt
>>>
>>> * iMac with AMD HD6750M (no retina, texture barrier is available):
>>> http://cr.openjdk.java.net/~bae/8087201/9/imac-amd-hd6750m.txt
>>>
>>> * MBP with OSX10.8, NV GF9600M (no retina, no texture barrier):
>>> http://cr.openjdk.java.net/~bae/8087201/9/mbp-10.8-NVGF9600M.txt
>>>
>>> Please take a look.
>>>
>>> Thanks,
>>> Andrew
>>
>>
>

-- 
Best regards, Sergey.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20150624/c5a8d033/attachment.html>