RFR: 8313648: JavaFX application continues to show a black screen after graphic card driver crash
Thorsten Fischer
duke at openjdk.org
Mon Nov 27 20:00:18 UTC 2023
On Mon, 27 Nov 2023 15:00:30 GMT, mintykat <duke at openjdk.org> wrote:
>> Hi,
>>
>> I did open the bug report. Some notes to this PR:
>>
>> My colleagues and I are able to reproduce this bug regularly, even though it takes sometimes up to 3 or 4 weeks until the D3DERR_DEVICEHUNG error shows up. We are currently evaluating two versions of fixes, but until now we do not have any results. I will post them as soon as I got them.
>>
>> Version 1 (this version): Based on the observation, that the TestCooperativeLevel/CheckDeviceState method returns D3D_OK again after about 20 - 60 seconds, the reinitialize is called after the first time the state is returning D3D_OK. The 'isHung' flag stores the information until then.
>>
>> Version 2: calls reinitialize directly after D3DERR_DEVICEHUNG has been returned. Basically
>> if (hr == D3DERR_DEVICEREMOVED || hr == D3DERR_DEVICEHUNG ) { .. }
>>
>> I did not modify the validatePresent method, as for our workaround (see ticket) it was not necessary. At least the native call swapchain->present dows not return that error code (https://learn.microsoft.com/en-us/windows/win32/api/d3d9/nf-d3d9-idirect3dswapchain9-present). I did not look decisively into all the native calls behind D3DRTTexture#readPixels.
>>
>> As I said I will post the results (prism.verbose output) for the 2 versions later as a base for discussions.
>
> I have put this in D3DContext.java (as per customer suggestion). Just wondering if I should just reinitialize directly and not wait loop: in testLostStateAndReset in D3DContext.java (D3DERR_DEVICEREMOVED is handled further down)
> if (hr == D3DERR_DEVICEHUNG) {
> setLost();
>
> long retryMillis = TimeUnit.MINUTES.toMillis(5);
> long sleepMillis = TimeUnit.SECONDS.toMillis(1);
> //Is this loop necessary?
> for (int i = 0; i < retryMillis; i += sleepMillis) {
> int cooperativeLevel = D3DResourceFactory.nTestCooperativeLevel(pContext);
> System.err.println("Checking Cooperative Level: " + cooperativeLevel);
>
> if (cooperativeLevel == D3D_OK) {
> break;
> } else {
> try {
> Thread.sleep(sleepMillis);
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> }
> }
>
> // Reinitialize after 5 mins anyway, even if result is not OK.
>
> // Reinitialize the D3DPipeline. This will dispose and recreate
> // the resource factory and context for each adapt
> D3DPipeline.getInstance().reinitialize();
> LOGGER.warn("Reinit after graphics hang.");
> }
Hello @mintykat , the loop is not necessary, in fact it is not recommended as it makes the whole application (window) unresponsive due to the Thread.sleep. It was just an approach to generate some debug output in order to see how the system and D3D is responding after the failure happens. Best is to call reinitialize directly after the check for the D3DERR_DEVICEHUNG error code.
I'm wondering a little bit, if JavaFX may be somehow responsible for the crash or if its just old drivers? This fix is going to keep the application running, but those crashes will still happen. You said, that you can reproduce this error every couple of hours? If you have the capacity, maybe you can track down the root cause? But I guess it is not a trivial task, I just [found here that some debug.dlls](https://learn.microsoft.com/en-us/windows/win32/direct3d9/troubleshooting#debugging) are needed. We had this error 'only' every few days (sometimes weeks), and I wasn't quite motivated for such a long winding deep-dive bug hunt :)
-------------
PR Comment: https://git.openjdk.org/jfx/pull/1199#issuecomment-1828503142
More information about the openjfx-dev
mailing list