RFR(xxs): 8185706: Native callstacks unreliable under Windows x64

Ioi Lam ioi.lam at oracle.com
Mon Aug 7 17:39:39 UTC 2017


Hi Thomas,

Thanks for the patch!

Skipping the test for SP != NULL and FP != NULL seems generally OK for 
me. I think StackWalk64 should be robust enough that when given NULL or 
bogus values for stk.AddrStack.Offset and stk.AddrFrame.Offset, it will 
still somehow recover gracefully. I forgot exactly why I put in these 
checks, though. I either was overly cautious, or I might have seen some 
problems without such checks, which might have caused crashes inside the 
debug printing routine. I really should have put in a comment there :-(

By being generous to myself :-), I guess I would have put in an comment 
had I saw crash, so the lack of comments probably meant I was just over 
cautious ....

How much testing have you done with your patch. Have you seen any crash 
inside the printing routine?

Also, by "Native callstacks unreliable", do you mean "Native callstacks 
printing terminates prematurely", and not "sometimes they fail and print 
erroneous information or behave unexpectedly"? I think it's better to 
update the bug title.

If you need a sponsor, I'll be happy to do it.

Thanks
- Ioi



On 8/2/17 2:17 AM, Thomas Stüfe wrote:
> Hi all,
>
> may I please have a review for this small fix.
>
> Issue: https://bugs.openjdk.java.net/browse/JDK-8185706
> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/
> 8185706-Native-callstacks-unreliable-under-Windows-x64/webrev.00/webrev/
>
> This can be seen as an addon to https://bugs.openjdk.java.
> net/browse/JDK-8022335. Ioi Lam did a good job analyzing the original
> problem. On windows x64, the native compiler generates code which does not
> use the frame pointer (regardless whether we set -Oy-). Only in rare cases
> a frame pointer is used - e.g. for alloca()-functions - and, as Ioi pointed
> out, no guarantee either that RBP is actually the frame pointer.
>
> So, in os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
> platform_print_native_stack
> <http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
> we walk the stack using StackWalk64(), extract the pc from each frame and
> print that, like normal windows coding. However, we still test for the
> frame pointer being NULL, and abort stack tracing if it is. This causes
> stack dumping to fail quite often, and unnecessarily.
>
> For example, test: java.exe -XX:ErrorHandlerTest=12
>
> Sometimes it works, but more out of accident - as Ioi pointed out in this
> mail thread: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/
> 2013-August/009063.html. If there are java frames above the crashing native
> frame, we still may have RBP set to some value (does not matter which) and
> os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
> platform_print_native_stack
> <http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
> does not abort frame printing.
>
> Kind Regards, Thomas



More information about the hotspot-runtime-dev mailing list