[OpenJDK 2D-Dev] [11] Upgrade to Marlin renderer 0.9.1

Tue Mar 6 08:38:04 UTC 2018

Hi Sergey,

Good to get feedback from java2d team !

I am investing a lot of my own time on improving java2d with the Marlin
renderer & other optimizations (like these ones) so I expect more
involvement from the openjdk 2d community ... to help me testing,
commenting, reviewing patches and I suffer from discussion lags & being
alone pushing the limits !
I know the java2d team is very small but the community is larger and java
(swing) desktop application like IDE (netbeans, eclipse or idea) are widely
used and necessary for any java development (critical).

Who would join my efforts to run tests, benchmarks ... ?

First Marlin 0.9.1 patch works well: no crash or bug with larger tiles
>> 128x64 even if d3d / opengl uses internally 32x32 texture caches.
>> Moreover xrender pipeline is the fastest compared to D3D (40% slower) or
>> opengl (>250% slower) !
>>
>
> What types of GraphicsPrimitives were tested(in terms of java2d)? I guess
> that d3d/ogl may outperform other pipelines only in case of "native" blits,
> which are used in case of draw of cashed bufferedImage(OGL texture) or
> VolatileImage(FBO) to the window/volatile image(this also depends from the
> composite type and alpha).
>

I just ran my own MapBench tool (1 thread) that performs GeneralPath draw /
fill operations (+ image clear). This tool (source code + binary releases)
is available on my github:
https://github.com/bourgesl/mapbench

MapBench either uses ARGB BufferedImages (sw loops) or VolatileImages
(d3d/ogl/xr accelerated) = maskfill only as I mainly focused on Marlin
performance (AA) as I think AA is generally used in 2018.

(J2DBench does not have proper path tests even if I worked with Jim in 2015
to add specific test cases for GeneralPath + multi-threading).

Finally, I was surprised to see that performance differs so much on the
same hardware (dual boot) between pipelines. Moreover, windows is a major
platform and I supposed java2d d3d pipelines was highly tuned for such
major use case. Of couse, that means there are still many things to improve
in 2018 (fast GPU) for the overall java community benefit.

> In other cases it will be slower since it use an additional layer -
> RenderQueue, it would be good to compare xrender and gdi/X11.
>

Yes d3d/ogl are defered rendering so using RenderQueue buffering has costs
(1 extra mask copy + command arguments). I wonder how to improve such
buffer queue to have less synchronization overhead induced by flushNow(),
and why increasing the buffer improved so much the performance (less JNI
overhead, more threads ?)

I will not test gdi nor x11 as nowadays the officially supported pipelines
are xrender (linux), opengl (mac) & d3d (windows).

>
> Some unrelated question: it is interesting how xrender will work in
> Wayland.
>
>
> Note 1. OGL is not officially supported on linux. We need to check ogl
> performance on macOS where it is used as a default pipeline.
>

1. Yes I know but I wanted to evaluate if my RenderQueue buffer change
impacted positively the opengl pipeline. It improved performance on both
windows & linux by the same order.

2. It takes me a lot of my time to run benchmarks on many platforms (my own
personal machines) so I did not have time yet to test on macOS.

I would really appreciate if I or any Q&A people could run benchmarks on a
shared testing platform (win, linux, mac, sparc) having all sort of CPU &
GPU.
Would Oracle or adoptopenjdk allow such external access to such a platform
(to be built) ?

>
> Note 2. there are some other blit's related tiles like:
> #define OGLC_BLIT_TILE_SIZE 128
>
> Also please be care about different vendors:
> OGLC_VENDOR_INTEL/OGLC_VENDOR_ATI/OGLC_VENDOR_NVIDIA because their native
> blits are implemented differently. The reason was in a performance of some
> OGL API(maybe this code is outdated).
>

Thanks for the pointer, I will have a look. As said, I want to improve
Marlin performance first.

>
> I suggest to use some common testcases from J2DBench and SwingMark, so at
> some point later it will be possible to check other changes for possible
> regression, for example see:
> https://bugs.openjdk.java.net/browse/JDK-8059944
> Note that this fix in some cases decreased a performance by half but in
> other cases improved it by 25422.21%. You can see we can improve
> performance in one case but lose much more in another. This is why J2DBench
> is better because it can check all combinations of
> src/dst/composite/clip/size/etc..
>

Ok, I will take time to run J2DBench on my machines (with/without patch)
and share the results.

>
> Soory to insist but who could advice me and give me explanations on the
>> RenderQueue & d3d / opengl backends.
>>
>> I read JBS for RenderQueue buffering as I have several questions
>> (asynchronous queue ?)
>> How to auto-tune such buffer capacity ?
>> It seems tricky as too small or too large buffers impacts performance as
>> it depends on the GPU speed (drain the buffer).
>> Is there any design document ?
>>
>
> As far as I know there are no documents about tile tuning, most of
> decisions were made according to j2dbench results. But this code still uses
> ogl_2 and d3d_9 so it can be possible that the new versions of these API
> have a better alternative.
>
> PS: I will once again look into java2d pipelines for tile constants (32)
>> to see if other parts should be updated (TexturePaint ?)...
>> I also need help on testing such patches on many hw platforms to detect
>> regressions (j2dBench, MapBench...)
>>
>
> I guess we can run these tests on at least supported configurations.
>

That would be awesome if you could try my very basic changes to RenderQueue
/ OGLRenderQueue on your machines and run J2DBench to detect any regression
/ performance change.
AFAIU RenderQueue locks awt so one thread is using the buffer at the same
time so the producer thread is either writing in the buffer or waiting for
the queue to be flushed (as fast as possible) by the ogl / d3d consumer (c
code).
I know game engines uses several command buffers (flip) to let the producer
threads not stuck and enable more concurrency. Maybe it would be a nice
approach to have but it means rewriting larger parts of the
BufferedRenderer ... and pipeline codes.

In conclusion I will go on investing my own time in improving Java2D
pipelines only if some some joint-effort happens soon (clemens is already
active even if he has less time than me) to make such potential
enhancements within OpenJDK11 timeframe.
How many people in the Java2d group / team (Oracle and any other company)
are still present now ?
I know java2d implementation is quite old (opengl pipeline was added in
2005 !) so most people left or are no more working on java graphics.

Best regards,
Laurent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20180306/432f1b64/attachment.html>