Bugfix JDK-8229902 rendering time improvement

dani-kurmann dani-kurmann at protonmail.com
Thu Nov 7 09:58:27 UTC 2024


Sorry for my slow working pace, I can only spare about a day a week for this, hope this is ok.

I'm not done yet, but I could use a little feedback.

I tried different versions of separating the addBuffer and flush procedures, playing with the sequence of things, etc. Long story short: you can break alot of other things and make it work as a Bugfix. But the reliable Bugfix for JDK-8229902 is not calling flush() in freeSpace(int size).

However, there is definitifely a problem in the buffer management. All I can say with confidence for now is that there are unnecessary flush-calls, and I strongly suspect also unnecessary addBuffer-calls.

So there is rendering time and possibly memory usage to be optimised. And maybe not just for Canvas, this touches a lot more in webkit rendering.
I'm inclined to see this as two different problems: 1) Bug JDK-8229902 and 2) RenderQueue Buffer management
Maybe improving Buffer management will at some point also solve 1), so I will first look into 2) and then see if 1) is still needed.

I havent looked much into the memory usage yet. What seems to improve rendering time the most is removing the flush()-call from WCRenderQueue::addBuffer(). And as far as I see, this doesnt even break anything. (there are some flushBuffer()-calls to be further analysed, some can be changed to flush(), some can be omitted. I still think, something in flush() needs fixing, but im not there yet. Still learing how to navigate and understand the jfx codebase)
This improves Canvas rendering time by about one third. And as far as I see, doesnt break anything. Now when I broke my build-setup last week, I tested it out on jfx-21. There I could see also improvements in rendering other things than canvas.
Back on the current Master of openjfx, that effect seems to go away, or at least hide in the margin of error. Overall, Webkit rendering time has significantly improved since 21. However, Canvas rendering time improvement with the change in WCRenderQueue on the current master is still at around one third.

So I'm hoping that if I understand the rendering time improvements done since jfx 21, I might better understand the problem at hand.

I could do something like a binary search for the change. But with my build-time, this will take me about a day. So another week.

If someone knows from the top of their head, what commit(s) significantly improved WebKit rendering time since jfx21, you could really save me some time. Maybe this is a change in Webkit, but I suspect its a change in how jfx uses webkit.

I mean, even if its just the Canvas rendering time improvement, that is something worth looking into, right?

thanks for any help or hints

kind regards
Dani

Sent with [Proton Mail](https://proton.me/mail/home) secure email.

On Friday, October 25th, 2024 at 1:28 PM, dani-kurmann <dani-kurmann at protonmail.com> wrote:

> Ok, the flush() function and its business in WCRenderQueue definitively deserve a closer look:
>
> I tried to do a proof-of-concept of how to test for this bug:
> https://github.com/CodeMonkeyIsland/Bugtest-JDK-8229902
>
> Its in a "its not pretty, but it works... sometimes"-state. Not ready for anything. But I discovered something interesting:
>
> Bug.java is the adjusted test (not unit test) from https://bugs.openjdk.org/browse/JDK-8229902, I passed the function in the html-document, because executeScript wasnt working. You can see the Bug there.
> In CanvasRenderingTest.java, I was testing out the testing strategy. Not very gracefully, I just handed over the pixelarray to be tested as a String and then decoded it into an int array again.
> I had some Problem with gc, I think, so I did the whole painting and extracting the imagadata with engine.executeScript(). - And the Bug doesnt show up anymore! ...
>
> I tried with bigger canvas, but it seems to work, just my test breaks at some point ... gc again I'm guessing.
>
> So definitively, the code is working as intended under some circumstances. So understanding when it works and when it doesnt work, probably means understanding the actual problem.
>
> Ill definitively have a deeper look into this sometime next week. And hopefully define the problem better.
>
> kind regards
>
> Dani
>
> Sent with [Proton Mail](https://proton.me/mail/home) secure email.
>
> On Monday, October 21st, 2024 at 6:19 PM, Kevin Rushforth <kevin.rushforth at oracle.com> wrote:
>
>> Hi Dani,
>>
>> Welcome, and thank you for taking a look at this problem.
>>
>> I don't know enough yet to suggest the best course of action, but I took a quick look at it. I only see the one call to the native RenderingQueue::flush -- from freeSpace(int) if autoFlush is true. RenderingQueue::flush does a Java upcall to WCRenderQueue::fwkFlush which then calls WCRenderQueue::flush to decode and process the buffer. The native RenderingQueue::flushBuffer method does a Java upcall to WCRenderQueue::fwkAddBuffer which will add the buffer to a linked list of buffers, and then allocate a new BufferData wrapper object. It then calls WCRenderQueue::flush if the total size of the linked list of buffers it too large.
>>
>> One thing that should be checked is whether the call to WCRenderQueue::flush from native RenderingQueue::flushBuffers and then again from RenderingQueue::flush is causing the problem. Instrumenting the code on the Java side (which is usually easier if it gives you the information you need), might show what's going on.
>>
>> -- Kevin
>>
>> On 10/20/2024 7:01 AM, dani-kurmann wrote:
>>
>>> Hi everyone, nice to be here.
>>>
>>> I don't want to be overzealous, but I did sign up for a bugfix :)
>>> So here is me, socializing:
>>>
>>> I ran into the bug in a similiar scenario as described in the Bug report: https://bugs.openjdk.org/browse/JDK-8229902
>>>
>>> It's a buffer-problem. The culprit is the flush()-call on line 64 of /modules/javafx.web/src/main/native/Source/WebCore/platform/graphics/java/RenderingQueue.cpp (in the freeSpace(int size) function)
>>>
>>> if (m_buffer && !m_buffer->hasFreeSpace(size)) {
>>> flushBuffer();
>>>
>>> if (m_autoFlush) {
>>> flush();
>>> }
>>>
>>> }
>>>
>>> this gets called, before a shape(or possibly something else) is added to the bytebuffer. If there is not enough space left in the buffer, here we go:
>>>
>>> The flushBuffer()-Call works fine. It sets up new Buffer Space as intended and, I guess, flushes the buffer.
>>> The flush()-call however leads to the described Bug. Just comment it out to test it. But thats not a bugfix, of course.
>>>
>>> Its very interesting where the two calls lead, this is right in the sweet spot between java and c++ and openjdk and webkit. I dont claim to understand half of it. But I do find it very interesting.
>>>
>>> This is the point where I ask for help: I can propose a bugfix, but I want to understand the issue deeper. Maybe this is a newbie-problem, but I have problems understanding the above mentioned sweet spot. So any advice is very welcome!
>>>
>>> Please DO correct me, and show me that other approach that I dont see, but the way I see it, there's two ways to approach this thing:
>>> A) Treat it as an initialisation-problem. e.g. There is nothing wrong with RenderingQueue, it just needs to be initialised with m_autoFlush=false. (in this case, at least)
>>> B) There seems to be something strange going on in RenderingQueue, understand it and fix it.
>>>
>>> A) is a lot easier: just follow back to where the RenderingQueue Constructor gets called: /modules/javafx.web/src/main/native/Source/WebCore/platform/graphics/java/ImageBufferJavaBackend.cpp, line 82:
>>>
>>> auto context = makeUnique<GraphicsContextJava>(new PlatformContextJava(wcRenderQueue, true));
>>>
>>> The second Argument for the PlatformContext-constructor sets the m_autoFlush-Variable upon which the flush()-call in the freeSpace-function in RenderingQueue.cpp is conditional. Set it to false.
>>> as a git commit: https://github.com/CodeMonkeyIsland/jfx/commit/96f23307c7b6f324bac416b90c0eac4ad40b13a8
>>>
>>> pro:
>>> -simple fix
>>> -no need to mess around in more than one file
>>> -looks like this is a normal way to use RenderingQueue
>>> con:
>>> -more and more feels like a hack
>>>
>>> B) is can of worms: You can comment out not only the flush()-call in the freespace-function, but the flush() function itself. At first, I thought, that the flush()-function is still needed for a flush-call during flushBuffer() outside RenderingQueue. But that was not the same flush()-function. I just tested that, and it works. (havent checked yet, if this breaks something else)
>>> So assuming, the flush()-function is really defunct and not needed(and I would have to dive into this a lot deeper to make that statement as a fact), this is a chance to do some spring-cleaning in RenderingQueue. Again, I only see two main ways of approaching this:
>>> - make m_autoFlush obsolete. Since (if the assumption above is right) it already flushes regardless of how m_autoFlush is set, lets stop pretending. Remove m_autoFlush, adjust the constructors and the constructor calls outside RenderingQueue.
>>> - make m_autoFlush great again :) It would basically mean separating flushBuffer() into two functions, one that does the addBuffer-part and one that does the flushing. Then do the addBuffer unconditional and the flushing conditional on m_autoFlush().
>>> The "problem" here being, that other parts of the codebase seem to have gone the way of approach A). So another piece of code initializing RenderingQueue with m_autoFlush=false doesnt mean that they dont want to autoFlush. So every call of the constructor has to be analysed, if the intention is to autoflush or not. I dont see the use-case for not flushing. So this approach might very well lead to a situation, where every single constructor call sets m_autoFlush to true and m_autoFlush is kind of obsolete again. Or not. Anyone has an example, where it shouldnt flush? Maybe there is runtime to be optimised by planning when to flush? I dont know...
>>>
>>> pro:
>>> -if there is a problem in RenderingQueue, then thats the place to fix it! No matter the consequences.
>>> con:
>>> -even if its an easy fix in RenderingQueue, it will change its behaviour. Or at the very least make autoFlush obsolete. This has consequences.
>>>
>>> Approach B) seems a lot scarier than A), but if it must be done, it should be done. If you guys think this is the rabbit hole to be, I would be willing to go there. However, I don't feel competent to overview all the consequences of this. I could really need some advice and help on that road, I guess thats why Im here.
>>> I could imagine doing A) as a temporary fix and an exercise. Then exploring B) deeper with some help.
>>>
>>> As for testing: At first glance, this seems like a non-automated Mk.1 eyeball-test. On a closer look however, you could do something like the Bug-Class in the bug-report, extract the RenderingContext-imagedata, pass it from js to java and check the color of the pixel, to see if a shape has been drawn. A window would have to be opened for this test, I dont know if thats allowed. (But why not, I guess)
>>>
>>> So what do you guys think? Does that sound like a plan? Does this break something, that I don't see?
>>> I am new here. Anyone willing to have a look into this and maybe point me in the right direction sometimes?
>>>
>>> kind regards
>>>
>>> Dani Kurmann
>>>
>>> p.s. the Bug class in the bug report does not work anymore.(at least for me, forked or unforked) the script in executeScript doesn't get executed. As a quick fix, you can pass the function in the html document like so:
>>>
>>> WebEngine engine = webView.getEngine();
>>> String content="<html><body><canvas id=\"canvas\" width=\"600\" height=\"600\" style=\"border: 1px solid red;\"></canvas><script>window.onload = function() {";
>>> content=content
>>> + " var canvas = document.getElementById('canvas');"
>>> + " var c = canvas.getContext('2d');"
>>> + " var size = 4;"
>>> + " var step = size + 2;"
>>> + " for (var y = 0; y <= canvas.height; y = y + step) {"
>>> + " for (var x = 0; x <= canvas.width; x = x + step) {"
>>> + " c.fillRect(x, y, size, size);"
>>> + " }"
>>> + " }";
>>> content=content+"}</script></body></html>";  engine.loadContent(content);
>>>
>>> Sent with [Proton Mail](https://proton.me/mail/home) secure email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/openjfx-dev/attachments/20241107/04e89c81/attachment-0001.htm>


More information about the openjfx-dev mailing list