[OpenJDK 2D-Dev] <AWT Dev> RFR: 8240654 : ZGC can cause severe UI application repaint issues

Philip Race philip.race at oracle.com
Tue Jun 23 17:30:07 UTC 2020


For the record, Microsoft have no published a KB article acknowledging 
the problem:-

https://support.microsoft.com/en-us/help/4567569/gdi-apis-may-fail-when-large-pages-or-vad-spanning-is-used


-phil.



On 6/11/20, 9:12 PM, Jayathirth D v wrote:
> +1.
>
> Thanks,
> Jay
>
>> On 12-Jun-2020, at 12:16 AM, Philip Race <philip.race at oracle.com 
>> <mailto:philip.race at oracle.com>> wrote:
>>
>>
>> one more update to the tests
>> cr.openjdk.java.net/~prr/8240654.2/ 
>> <http://cr.openjdk.java.net/%7Eprr/8240654.2/>
>>
>> I added
>> @requires vm.gc.Z per the VM folks, ZGC needs Windows Server 2019 or 
>> the same vintage Window 10. if run on Windows Server 2016
>> ZGC errors out. This should prevent that. -phil.
>> On 6/11/2020 11:13 AM, Kevin Rushforth wrote:
>>> +1
>>>
>>> The updated LargeWindowPaintTest fails for me without the fix 
>>> without my having to manually set uiScale. It passes with the fix.
>>>
>>> Interestingly enough, I finally saw the problem that Jay reported 
>>> with AlphaPrintTest: without the fix I initially get a blank (all 
>>> white) window. If I resize it then it is drawn. With the fix 
>>> everything is fine.
>>>
>>> -- Kevin
>>>
>>> On 6/11/2020 10:48 AM, Philip Race wrote:
>>>> Updated webrev here
>>>>
>>>> http://cr.openjdk.java.net/~prr/8240654.1/index.html
>>>>
>>>> The only changes are to the tests - to add -Dsun.java2d.uiScale=1 
>>>> to the onscreen test
>>>> and to add printer to the keys for the printing test.
>>>>
>>>> It was pointed out by Stefan from the ZGC team that the changes in 
>>>> awt_TrayIcon.cpp and awt_Cursor.cpp
>>>> should not be needed because the GDI code in Create_BMP that 
>>>> ultimately consumes the data
>>>> has processed and copied it into memory allocated by 
>>>> CreateDIBSection before passing it to
>>>> CreateBitmap. I considered reverting those two files but decided to 
>>>> keep them because I
>>>> think I would like this fix anyway. We really don't need to lock 
>>>> down the VM in these cases.
>>>>
>>>> -phil.
>>>>
>>>>
>>>> On 6/11/20, 9:55 AM, Philip Race wrote:
>>>>> I have confirmed hit a different code path. It goes through 
>>>>> generic 2D s/w loops in this case.
>>>>> ie we don't use GDIBlitLoops at all. The code in 
>>>>> sun/java2d/pipe/DrawImage.java ends up in
>>>>> scaleSurfaceData which uses the loops in ScaledBlit.c.
>>>>>
>>>>> It is a bit surprising to me since I'd expect us to be able to 
>>>>> blit directly at device resolution.
>>>>> Could we also be taking a performance hit here ? The D3D case 
>>>>> doesn't not go through this loop.
>>>>>
>>>>> However all that is outside the scope of this fix ... I think 
>>>>> setting uiScale=1 in the test is all that needs to be done.
>>>>>
>>>>> -phil.
>>>>>
>>>>> On 6/11/2020 7:51 AM, Philip Race wrote:
>>>>>> Or, maybe we hit a different code path. I'll check that.
>>>>>> uiScale=1 is the way to ensure we hit this code path.
>>>>>>
>>>>>> -phil.
>>>>>>
>>>>>> On 6/11/20, 7:44 AM, Philip Race wrote:
>>>>>>> Can I get clarification here.
>>>>>>>
>>>>>>> > I do, and had to run with "-Dsun.java2d.uiScale=1" in order to 
>>>>>>> see the failure with LargeWindowPaintTest.
>>>>>>>
>>>>>>> So you both mean a JDK 15 promoted build without this fix and 
>>>>>>> without this property passes because you have
>>>>>>> a hidpi setup. And to see the failure without the fix you needed 
>>>>>>> the above property.
>>>>>>> If so we could just be looking at a similar anomaly as I saw 
>>>>>>> with printing which uses a very large
>>>>>>> image - it reported failure but actually worked !
>>>>>>>
>>>>>>> Also - for both of you - with the fix and without forcing 
>>>>>>> uiScale=1 does the test pass ?
>>>>>>>
>>>>>>> -phil.
>>>>>>>
>>>>>>> On 6/11/20, 7:10 AM, Jayathirth D v wrote:
>>>>>>>> Yes my machine was at 150% scaling.
>>>>>>>>
>>>>>>>> If I force uiScale = 1, I see that:
>>>>>>>> LargeWindowPaintTest fails without patch and passes with patch.
>>>>>>>> AlphaPrintTest shows instructions without patch also.
>>>>>>>>
>>>>>>>> @Phil : I think its better if we test at uiScale=1(larger 
>>>>>>>> memory footprint). Please clarify.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jay
>>>>>>>>
>>>>>>>>> On 11-Jun-2020, at 5:53 PM, Kevin Rushforth 
>>>>>>>>> <kevin.rushforth at oracle.com 
>>>>>>>>> <mailto:kevin.rushforth at oracle.com>> wrote:
>>>>>>>>>
>>>>>>>>> Do you have a Hi-DPI machine? I do, and had to run with 
>>>>>>>>> "-Dsun.java2d.uiScale=1" in order to see the failure with 
>>>>>>>>> LargeWindowPaintTest.
>>>>>>>>>
>>>>>>>>> For AlphaPrintTest, the test deliberately ensures that you 
>>>>>>>>> print before saying whether it passes or not. FWIW, I verified 
>>>>>>>>> that the printing test on my system was hitting the fallback 
>>>>>>>>> code with the patch, but it seemed to print correctly even 
>>>>>>>>> without the patch.
>>>>>>>>>
>>>>>>>>> -- Kevin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6/11/2020 1:58 AM, Jayathirth D v wrote:
>>>>>>>>>> Typo : I tried tested -> I tried testing
>>>>>>>>>>
>>>>>>>>>>> On 11-Jun-2020, at 2:27 PM, Jayathirth D v 
>>>>>>>>>>> <JAYATHIRTH.D.V at ORACLE.COM 
>>>>>>>>>>> <mailto:JAYATHIRTH.D.V at ORACLE.COM>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Phil,
>>>>>>>>>>>
>>>>>>>>>>> I tried tested the fix in my Windows 10 machine with Intel 
>>>>>>>>>>> integrated UHD Graphics 620.
>>>>>>>>>>>
>>>>>>>>>>> LargeWindowPaintTest.java passes with/without fix in my machine.
>>>>>>>>>>> AlphaPrintTest.java without fix just opens up blank frame 
>>>>>>>>>>> without any instructions and with fix it shows instructions 
>>>>>>>>>>> for the test.
>>>>>>>>>>> Is this expected behaviour?
>>>>>>>>>>>
>>>>>>>>>>> AlphaPrintTest.java with fix when it shows instructions if I 
>>>>>>>>>>> click on Pass(Since I don’t have printer right now) 
>>>>>>>>>>> it doesn’t pass/close the window. Only after I click on 
>>>>>>>>>>> Print button and then close print dialog it allows me to 
>>>>>>>>>>> click on Pass button.
>>>>>>>>>>>
>>>>>>>>>>> Also how does these tests behave in our internal CI machines?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jay
>>>>>>>>>>>
>>>>>>>>>>>> On 11-Jun-2020, at 2:18 AM, Philip Race 
>>>>>>>>>>>> <philip.race at oracle.com <mailto:philip.race at oracle.com>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8240654
>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~prr/8240654/index.html
>>>>>>>>>>>>
>>>>>>>>>>>> This is for JDK 15 so review ASAP please since RDP 1 and 
>>>>>>>>>>>> the test cycle are looming.
>>>>>>>>>>>>
>>>>>>>>>>>> This is not a fix for a JDK bug. It is a bunch of 
>>>>>>>>>>>> workarounds for a Microsoft Windows bug affecting
>>>>>>>>>>>> GDI in the context of ZGC (http://openjdk.java.net/jeps/333).
>>>>>>>>>>>> Some extra details about the Windows bug at the end, but 
>>>>>>>>>>>> first the technical details of the fix.
>>>>>>>>>>>>
>>>>>>>>>>>> With ZGC's memory allocation requirement of reserving 
>>>>>>>>>>>> memory in 2Mb chunks  some Windows GDI
>>>>>>>>>>>> functions, mostly involving some bitmaps APIs may return a 
>>>>>>>>>>>> failure code (ie fail!)
>>>>>>>>>>>> This typically occurs when Java heap memory is used for a 
>>>>>>>>>>>> Java image and then in a JNI
>>>>>>>>>>>> call we use GetPrimitiveArrayCritical so that Java heap 
>>>>>>>>>>>> allocated memory is passed to a GDI
>>>>>>>>>>>> function AND the Java heap memory spans one of the 2Mb 
>>>>>>>>>>>> boundaries.
>>>>>>>>>>>> This is very easy to trigger in almost any Java UI app if 
>>>>>>>>>>>> the window is of a large enough (ie typical) size.
>>>>>>>>>>>> NB: if you have an Nvidia or ATI card, then you won't see 
>>>>>>>>>>>> it, because the D3D pipeline doesn't
>>>>>>>>>>>> call the affected method but if you have an Intel chip as 
>>>>>>>>>>>> do 90% (?) of laptops you will see it.
>>>>>>>>>>>> There are also several other places we found that are 
>>>>>>>>>>>> affected. Printing is the other one
>>>>>>>>>>>> somewhat easy to trigger. The others : custom cursors and 
>>>>>>>>>>>> tray icons are less common.
>>>>>>>>>>>> The painful thing here is that there is no definitive list 
>>>>>>>>>>>> (a list of the known ones is below) of
>>>>>>>>>>>> affected Windows GDI APIs and we are just hunting around 
>>>>>>>>>>>> our code trying to see where it
>>>>>>>>>>>> might be side-swiped by this bug.
>>>>>>>>>>>>
>>>>>>>>>>>> The basic approach in these workarounds is that for cases 
>>>>>>>>>>>> where performance does not matter we now copy
>>>>>>>>>>>> and for cases where performance does matter or larger 
>>>>>>>>>>>> amounts of memory is involved we check if
>>>>>>>>>>>> the return value of the GDI function indicates failure and 
>>>>>>>>>>>> then re-try with a copy of the heap memory.
>>>>>>>>>>>> Unless GDI was randomly failing already (unlikely) this 
>>>>>>>>>>>> should be a no-risk solution in the high profile cases.
>>>>>>>>>>>> We have done performance measurements on the important 
>>>>>>>>>>>> screen case and the failures
>>>>>>>>>>>> happen fast so the penalty is then in the re-try which is 
>>>>>>>>>>>> only if ZGC is enabled.
>>>>>>>>>>>> Always copying the memory is slower (and memcpy is the slow 
>>>>>>>>>>>> operation) than an alternative approach
>>>>>>>>>>>> that "knows" about the memory allocation of ZGC but this 
>>>>>>>>>>>> coupling and the complexity seem like they aren't
>>>>>>>>>>>> worth it since I haven't seen any visible performance 
>>>>>>>>>>>> consequence. That can be revisited
>>>>>>>>>>>> some day if need be, but for now we have correctness which 
>>>>>>>>>>>> is the key as well as sufficient performance.
>>>>>>>>>>>>
>>>>>>>>>>>> I've created an automated test for the most important 
>>>>>>>>>>>> on-screen case.
>>>>>>>>>>>> Also a manual printing test case which invokes ZGC is 
>>>>>>>>>>>> provided since there we also only
>>>>>>>>>>>> conditionally copy. In the other cases we now always copy 
>>>>>>>>>>>> so existing test cases should over those.
>>>>>>>>>>>>
>>>>>>>>>>>> There is some clean up in this fix - one completely unused  
>>>>>>>>>>>> (provably so because it was #if'd out)
>>>>>>>>>>>> JNI method in awt_PrintJob.cpp is removed since it had code 
>>>>>>>>>>>> that looked like it needed a workaround,
>>>>>>>>>>>> which would be somewhat of a waste of effort.
>>>>>>>>>>>>
>>>>>>>>>>>> the doPrintBand code and its callee bitsToDevice has code I 
>>>>>>>>>>>> think we can remove too since
>>>>>>>>>>>> I don't see how it ever gets executed (the top down case 
>>>>>>>>>>>> for browserPrint == true) but
>>>>>>>>>>>> I think I'll save that for a P4 follow-on since it does 
>>>>>>>>>>>> nothing that would be affected by this
>>>>>>>>>>>> Windows bug.
>>>>>>>>>>>>
>>>>>>>>>>>> One oddity is the in the printing case I observed that some 
>>>>>>>>>>>> times the rendering is performed
>>>>>>>>>>>> even if an error code is returned. I don't know why, but in 
>>>>>>>>>>>> code we can't tell that it was actually
>>>>>>>>>>>> rendered and in any case there is no harm in repeating the 
>>>>>>>>>>>> call with copied memory.
>>>>>>>>>>>>
>>>>>>>>>>>> We are right before the JDK15 stabilisation fork and this 
>>>>>>>>>>>> fix needs to go there and will
>>>>>>>>>>>> but the webrev is against jdk/client simply because jdk15 
>>>>>>>>>>>> does not exist yet !
>>>>>>>>>>>>
>>>>>>>>>>>> Please test and review ASAP.
>>>>>>>>>>>>
>>>>>>>>>>>> About the bug:
>>>>>>>>>>>> Microsoft has acknowleged the bug and will publish a 
>>>>>>>>>>>> knowledge base article about it
>>>>>>>>>>>> but a fix may show up only in a future version of Windows. 
>>>>>>>>>>>> Not, it seems, any time soon.
>>>>>>>>>>>> Below is a list of potentially affected GDI APIs. Per 
>>>>>>>>>>>> microsoft whether it actually manifests in
>>>>>>>>>>>> any specific case depends on "branching"
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/previous-versions/windows/desktop/wcs/checkbitmapbits
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/previous-versions/windows/desktop/wcs/createcolortransform
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-setdibitstodevice
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-stretchdibits
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-getbitmapbits
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-createdibitmap
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-createdibsection
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-polydraw
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-drawescape
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-createbitmap
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-setbitmapbits
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-getdibits
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -phil.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/2d-dev/attachments/20200623/7f7ef58c/attachment-0001.htm>


More information about the 2d-dev mailing list