RFR(xxs): 8185706: Native callstacks unreliable under Windows x64
Thomas Stüfe
thomas.stuefe at gmail.com
Tue Aug 8 05:50:16 UTC 2017
Hi Ioi,
On Mon, Aug 7, 2017 at 7:39 PM, Ioi Lam <ioi.lam at oracle.com> wrote:
> Hi Thomas,
>
> Thanks for the patch!
>
> Skipping the test for SP != NULL and FP != NULL seems generally OK for me.
> I think StackWalk64 should be robust enough that when given NULL or bogus
> values for stk.AddrStack.Offset and stk.AddrFrame.Offset, it will still
> somehow recover gracefully. I forgot exactly why I put in these checks,
> though. I either was overly cautious, or I might have seen some problems
> without such checks, which might have caused crashes inside the debug
> printing routine. I really should have put in a comment there :-(
>
> By being generous to myself :-), I guess I would have put in an comment
> had I saw crash, so the lack of comments probably meant I was just over
> cautious ....
>
> How much testing have you done with your patch.
Pretty much only the error scenario (java -XX:+ErrorHandlingTest=xx) and
the gtests, both on Win x64.
> Have you seen any crash inside the printing routine?
>
None I would attribute to my change. I know there is a very slight risk of
crashing more often now, just based on the fact that we now continue stack
dumping where we skipped before, and because StackWalk64 is a black box.
But this is error handling, we deal with secondary crashes anyway and I
think I rather have more complete callstacks in the hs-err file and risk a
secondary crash instead of useless error reports.
Note that callstack dumping and symbol resolution is pretty unreliable and
unstable on windows anyway. See
https://bugs.openjdk.java.net/browse/JDK-8185712, I am currently working on
bringing improvements upstream we have in our fork. Our error handling is
more reliable than stock openjdk.
>
> Also, by "Native callstacks unreliable", do you mean "Native callstacks
> printing terminates prematurely", and not "sometimes they fail and print
> erroneous information or behave unexpectedly"? I think it's better to
> update the bug title.
>
>
Sure thats a better name :) I changed it.
> If you need a sponsor, I'll be happy to do it.
>
>
Thanks!
Now for a second reviewer? Anyone?
> Thanks
> - Ioi
>
>
..Thomas
>
>
> On 8/2/17 2:17 AM, Thomas Stüfe wrote:
>
>> Hi all,
>>
>> may I please have a review for this small fix.
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8185706
>> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/
>> 8185706-Native-callstacks-unreliable-under-Windows-x64/webrev.00/webrev/
>>
>> This can be seen as an addon to https://bugs.openjdk.java.
>> net/browse/JDK-8022335. Ioi Lam did a good job analyzing the original
>> problem. On windows x64, the native compiler generates code which does not
>> use the frame pointer (regardless whether we set -Oy-). Only in rare cases
>> a frame pointer is used - e.g. for alloca()-functions - and, as Ioi
>> pointed
>> out, no guarantee either that RBP is actually the frame pointer.
>>
>> So, in os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
>> platform_print_native_stack
>> <http://ld8443:8080/source/s?defs=platform_print_native_stac
>> k&project=integ-hotspot-X>()
>> we walk the stack using StackWalk64(), extract the pc from each frame and
>> print that, like normal windows coding. However, we still test for the
>> frame pointer being NULL, and abort stack tracing if it is. This causes
>> stack dumping to fail quite often, and unnecessarily.
>>
>> For example, test: java.exe -XX:ErrorHandlerTest=12
>>
>> Sometimes it works, but more out of accident - as Ioi pointed out in this
>> mail thread: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/
>> 2013-August/009063.html. If there are java frames above the crashing
>> native
>> frame, we still may have RBP set to some value (does not matter which) and
>> os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
>> platform_print_native_stack
>> <http://ld8443:8080/source/s?defs=platform_print_native_stac
>> k&project=integ-hotspot-X>()
>> does not abort frame printing.
>>
>> Kind Regards, Thomas
>>
>
>
More information about the hotspot-runtime-dev
mailing list