[OpenJDK 2D-Dev] <AWT Dev> RFR: 8240654 : ZGC can cause severe UI application repaint issues

Kevin Rushforth kevin.rushforth at oracle.com
Thu Jun 11 18:52:38 UTC 2020


-- Kevin

On 6/11/2020 11:46 AM, Philip Race wrote:
> one more update to the tests
> cr.openjdk.java.net/~prr/8240654.2/
> I added
> @requires vm.gc.Z per the VM folks, ZGC needs Windows Server 2019 or 
> the same vintage Window 10. if run on Windows Server 2016
> ZGC errors out. This should prevent that. -phil.
> On 6/11/2020 11:13 AM, Kevin Rushforth wrote:
>> +1
>> The updated LargeWindowPaintTest fails for me without the fix without 
>> my having to manually set uiScale. It passes with the fix.
>> Interestingly enough, I finally saw the problem that Jay reported 
>> with AlphaPrintTest: without the fix I initially get a blank (all 
>> white) window. If I resize it then it is drawn. With the fix 
>> everything is fine.
>> -- Kevin
>> On 6/11/2020 10:48 AM, Philip Race wrote:
>>> Updated webrev here
>>> http://cr.openjdk.java.net/~prr/8240654.1/index.html
>>> The only changes are to the tests - to add -Dsun.java2d.uiScale=1 to 
>>> the onscreen test
>>> and to add printer to the keys for the printing test.
>>> It was pointed out by Stefan from the ZGC team that the changes in 
>>> awt_TrayIcon.cpp and awt_Cursor.cpp
>>> should not be needed because the GDI code in Create_BMP that 
>>> ultimately consumes the data
>>> has processed and copied it into memory allocated by 
>>> CreateDIBSection before passing it to
>>> CreateBitmap. I considered reverting those two files but decided to 
>>> keep them because I
>>> think I would like this fix anyway. We really don't need to lock 
>>> down the VM in these cases.
>>> -phil.
>>> On 6/11/20, 9:55 AM, Philip Race wrote:
>>>> I have confirmed hit a different code path. It goes through generic 
>>>> 2D s/w loops in this case.
>>>> ie we don't use GDIBlitLoops at all. The code in 
>>>> sun/java2d/pipe/DrawImage.java ends up in
>>>> scaleSurfaceData which uses the loops in ScaledBlit.c.
>>>> It is a bit surprising to me since I'd expect us to be able to blit 
>>>> directly at device resolution.
>>>> Could we also be taking a performance hit here ? The D3D case 
>>>> doesn't not go through this loop.
>>>> However all that is outside the scope of this fix ... I think 
>>>> setting uiScale=1 in the test is all that needs to be done.
>>>> -phil.
>>>> On 6/11/2020 7:51 AM, Philip Race wrote:
>>>>> Or, maybe we hit a different code path. I'll check that.
>>>>> uiScale=1 is the way to ensure we hit this code path.
>>>>> -phil.
>>>>> On 6/11/20, 7:44 AM, Philip Race wrote:
>>>>>> Can I get clarification here.
>>>>>> > I do, and had to run with "-Dsun.java2d.uiScale=1" in order to 
>>>>>> see the failure with LargeWindowPaintTest.
>>>>>> So you both mean a JDK 15 promoted build without this fix and 
>>>>>> without this property passes because you have
>>>>>> a hidpi setup. And to see the failure without the fix you needed 
>>>>>> the above property.
>>>>>> If so we could just be looking at a similar anomaly as I saw with 
>>>>>> printing which uses a very large
>>>>>> image - it reported failure but actually worked !
>>>>>> Also - for both of you - with the fix and without forcing 
>>>>>> uiScale=1 does the test pass ?
>>>>>> -phil.
>>>>>> On 6/11/20, 7:10 AM, Jayathirth D v wrote:
>>>>>>> Yes my machine was at 150% scaling.
>>>>>>> If I force uiScale = 1, I see that:
>>>>>>> LargeWindowPaintTest fails without patch and passes with patch.
>>>>>>> AlphaPrintTest shows instructions without patch also.
>>>>>>> @Phil : I think its better if we test at uiScale=1(larger memory 
>>>>>>> footprint). Please clarify.
>>>>>>> Thanks,
>>>>>>> Jay
>>>>>>>> On 11-Jun-2020, at 5:53 PM, Kevin Rushforth 
>>>>>>>> <kevin.rushforth at oracle.com 
>>>>>>>> <mailto:kevin.rushforth at oracle.com>> wrote:
>>>>>>>> Do you have a Hi-DPI machine? I do, and had to run with 
>>>>>>>> "-Dsun.java2d.uiScale=1" in order to see the failure with 
>>>>>>>> LargeWindowPaintTest.
>>>>>>>> For AlphaPrintTest, the test deliberately ensures that you 
>>>>>>>> print before saying whether it passes or not. FWIW, I verified 
>>>>>>>> that the printing test on my system was hitting the fallback 
>>>>>>>> code with the patch, but it seemed to print correctly even 
>>>>>>>> without the patch.
>>>>>>>> -- Kevin
>>>>>>>> On 6/11/2020 1:58 AM, Jayathirth D v wrote:
>>>>>>>>> Typo : I tried tested -> I tried testing
>>>>>>>>>> On 11-Jun-2020, at 2:27 PM, Jayathirth D v 
>>>>>>>>>> <mailto:JAYATHIRTH.D.V at ORACLE.COM>> wrote:
>>>>>>>>>> Hi Phil,
>>>>>>>>>> I tried tested the fix in my Windows 10 machine with Intel 
>>>>>>>>>> integrated UHD Graphics 620.
>>>>>>>>>> LargeWindowPaintTest.java passes with/without fix in my machine.
>>>>>>>>>> AlphaPrintTest.java without fix just opens up blank frame 
>>>>>>>>>> without any instructions and with fix it shows instructions 
>>>>>>>>>> for the test.
>>>>>>>>>> Is this expected behaviour?
>>>>>>>>>> AlphaPrintTest.java with fix when it shows instructions if I 
>>>>>>>>>> click on Pass(Since I don’t have printer right now) 
>>>>>>>>>> it doesn’t pass/close the window. Only after I click on Print 
>>>>>>>>>> button and then close print dialog it allows me to click on 
>>>>>>>>>> Pass button.
>>>>>>>>>> Also how does these tests behave in our internal CI machines?
>>>>>>>>>> Thanks,
>>>>>>>>>> Jay
>>>>>>>>>>> On 11-Jun-2020, at 2:18 AM, Philip Race 
>>>>>>>>>>> <philip.race at oracle.com <mailto:philip.race at oracle.com>> wrote:
>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8240654
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~prr/8240654/index.html
>>>>>>>>>>> This is for JDK 15 so review ASAP please since RDP 1 and the 
>>>>>>>>>>> test cycle are looming.
>>>>>>>>>>> This is not a fix for a JDK bug. It is a bunch of 
>>>>>>>>>>> workarounds for a Microsoft Windows bug affecting
>>>>>>>>>>> GDI in the context of ZGC (http://openjdk.java.net/jeps/333).
>>>>>>>>>>> Some extra details about the Windows bug at the end, but 
>>>>>>>>>>> first the technical details of the fix.
>>>>>>>>>>> With ZGC's memory allocation requirement of reserving memory 
>>>>>>>>>>> in 2Mb chunks  some Windows GDI
>>>>>>>>>>> functions, mostly involving some bitmaps APIs may return a 
>>>>>>>>>>> failure code (ie fail!)
>>>>>>>>>>> This typically occurs when Java heap memory is used for a 
>>>>>>>>>>> Java image and then in a JNI
>>>>>>>>>>> call we use GetPrimitiveArrayCritical so that Java heap 
>>>>>>>>>>> allocated memory is passed to a GDI
>>>>>>>>>>> function AND the Java heap memory spans one of the 2Mb 
>>>>>>>>>>> boundaries.
>>>>>>>>>>> This is very easy to trigger in almost any Java UI app if 
>>>>>>>>>>> the window is of a large enough (ie typical) size.
>>>>>>>>>>> NB: if you have an Nvidia or ATI card, then you won't see 
>>>>>>>>>>> it, because the D3D pipeline doesn't
>>>>>>>>>>> call the affected method but if you have an Intel chip as do 
>>>>>>>>>>> 90% (?) of laptops you will see it.
>>>>>>>>>>> There are also several other places we found that are 
>>>>>>>>>>> affected. Printing is the other one
>>>>>>>>>>> somewhat easy to trigger. The others : custom cursors and 
>>>>>>>>>>> tray icons are less common.
>>>>>>>>>>> The painful thing here is that there is no definitive list 
>>>>>>>>>>> (a list of the known ones is below) of
>>>>>>>>>>> affected Windows GDI APIs and we are just hunting around our 
>>>>>>>>>>> code trying to see where it
>>>>>>>>>>> might be side-swiped by this bug.
>>>>>>>>>>> The basic approach in these workarounds is that for cases 
>>>>>>>>>>> where performance does not matter we now copy
>>>>>>>>>>> and for cases where performance does matter or larger 
>>>>>>>>>>> amounts of memory is involved we check if
>>>>>>>>>>> the return value of the GDI function indicates failure and 
>>>>>>>>>>> then re-try with a copy of the heap memory.
>>>>>>>>>>> Unless GDI was randomly failing already (unlikely) this 
>>>>>>>>>>> should be a no-risk solution in the high profile cases.
>>>>>>>>>>> We have done performance measurements on the important 
>>>>>>>>>>> screen case and the failures
>>>>>>>>>>> happen fast so the penalty is then in the re-try which is 
>>>>>>>>>>> only if ZGC is enabled.
>>>>>>>>>>> Always copying the memory is slower (and memcpy is the slow 
>>>>>>>>>>> operation) than an alternative approach
>>>>>>>>>>> that "knows" about the memory allocation of ZGC but this 
>>>>>>>>>>> coupling and the complexity seem like they aren't
>>>>>>>>>>> worth it since I haven't seen any visible performance 
>>>>>>>>>>> consequence. That can be revisited
>>>>>>>>>>> some day if need be, but for now we have correctness which 
>>>>>>>>>>> is the key as well as sufficient performance.
>>>>>>>>>>> I've created an automated test for the most important 
>>>>>>>>>>> on-screen case.
>>>>>>>>>>> Also a manual printing test case which invokes ZGC is 
>>>>>>>>>>> provided since there we also only
>>>>>>>>>>> conditionally copy. In the other cases we now always copy so 
>>>>>>>>>>> existing test cases should over those.
>>>>>>>>>>> There is some clean up in this fix - one completely unused 
>>>>>>>>>>> (provably so because it was #if'd out)
>>>>>>>>>>> JNI method in awt_PrintJob.cpp is removed since it had code 
>>>>>>>>>>> that looked like it needed a workaround,
>>>>>>>>>>> which would be somewhat of a waste of effort.
>>>>>>>>>>> the doPrintBand code and its callee bitsToDevice has code I 
>>>>>>>>>>> think we can remove too since
>>>>>>>>>>> I don't see how it ever gets executed (the top down case for 
>>>>>>>>>>> browserPrint == true) but
>>>>>>>>>>> I think I'll save that for a P4 follow-on since it does 
>>>>>>>>>>> nothing that would be affected by this
>>>>>>>>>>> Windows bug.
>>>>>>>>>>> One oddity is the in the printing case I observed that some 
>>>>>>>>>>> times the rendering is performed
>>>>>>>>>>> even if an error code is returned. I don't know why, but in 
>>>>>>>>>>> code we can't tell that it was actually
>>>>>>>>>>> rendered and in any case there is no harm in repeating the 
>>>>>>>>>>> call with copied memory.
>>>>>>>>>>> We are right before the JDK15 stabilisation fork and this 
>>>>>>>>>>> fix needs to go there and will
>>>>>>>>>>> but the webrev is against jdk/client simply because jdk15 
>>>>>>>>>>> does not exist yet !
>>>>>>>>>>> Please test and review ASAP.
>>>>>>>>>>> About the bug:
>>>>>>>>>>> Microsoft has acknowleged the bug and will publish a 
>>>>>>>>>>> knowledge base article about it
>>>>>>>>>>> but a fix may show up only in a future version of Windows. 
>>>>>>>>>>> Not, it seems, any time soon.
>>>>>>>>>>> Below is a list of potentially affected GDI APIs. Per 
>>>>>>>>>>> microsoft whether it actually manifests in
>>>>>>>>>>> any specific case depends on "branching"
>>>>>>>>>>> https://docs.microsoft.com/en-us/previous-versions/windows/desktop/wcs/checkbitmapbits
>>>>>>>>>>> https://docs.microsoft.com/en-us/previous-versions/windows/desktop/wcs/createcolortransform
>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-setdibitstodevice
>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-stretchdibits
>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-getbitmapbits
>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-createdibitmap
>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-createdibsection
>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-polydraw
>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-drawescape
>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-createbitmap
>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-setbitmapbits
>>>>>>>>>>> https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-getdibits
>>>>>>>>>>> -phil.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/2d-dev/attachments/20200611/ffaecfff/attachment-0001.htm>

More information about the 2d-dev mailing list