RFR(xxs): 8185706: Native callstacks unreliable under Windows x64
Ioi Lam
ioi.lam at oracle.com
Mon Aug 7 17:39:39 UTC 2017
Hi Thomas,
Thanks for the patch!
Skipping the test for SP != NULL and FP != NULL seems generally OK for
me. I think StackWalk64 should be robust enough that when given NULL or
bogus values for stk.AddrStack.Offset and stk.AddrFrame.Offset, it will
still somehow recover gracefully. I forgot exactly why I put in these
checks, though. I either was overly cautious, or I might have seen some
problems without such checks, which might have caused crashes inside the
debug printing routine. I really should have put in a comment there :-(
By being generous to myself :-), I guess I would have put in an comment
had I saw crash, so the lack of comments probably meant I was just over
cautious ....
How much testing have you done with your patch. Have you seen any crash
inside the printing routine?
Also, by "Native callstacks unreliable", do you mean "Native callstacks
printing terminates prematurely", and not "sometimes they fail and print
erroneous information or behave unexpectedly"? I think it's better to
update the bug title.
If you need a sponsor, I'll be happy to do it.
Thanks
- Ioi
On 8/2/17 2:17 AM, Thomas Stüfe wrote:
> Hi all,
>
> may I please have a review for this small fix.
>
> Issue: https://bugs.openjdk.java.net/browse/JDK-8185706
> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/
> 8185706-Native-callstacks-unreliable-under-Windows-x64/webrev.00/webrev/
>
> This can be seen as an addon to https://bugs.openjdk.java.
> net/browse/JDK-8022335. Ioi Lam did a good job analyzing the original
> problem. On windows x64, the native compiler generates code which does not
> use the frame pointer (regardless whether we set -Oy-). Only in rare cases
> a frame pointer is used - e.g. for alloca()-functions - and, as Ioi pointed
> out, no guarantee either that RBP is actually the frame pointer.
>
> So, in os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
> platform_print_native_stack
> <http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
> we walk the stack using StackWalk64(), extract the pc from each frame and
> print that, like normal windows coding. However, we still test for the
> frame pointer being NULL, and abort stack tracing if it is. This causes
> stack dumping to fail quite often, and unnecessarily.
>
> For example, test: java.exe -XX:ErrorHandlerTest=12
>
> Sometimes it works, but more out of accident - as Ioi pointed out in this
> mail thread: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/
> 2013-August/009063.html. If there are java frames above the crashing native
> frame, we still may have RBP set to some value (does not matter which) and
> os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
> platform_print_native_stack
> <http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
> does not abort frame printing.
>
> Kind Regards, Thomas
More information about the hotspot-runtime-dev
mailing list