RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v2]

Thomas Stuefe stuefe at openjdk.java.net
Fri Jan 28 10:02:15 UTC 2022


On Fri, 28 Jan 2022 09:19:04 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

> > Two general remarks. One concern I have is that the new functionality should be super stable, since nothing is more annoying than to crash during stack dumping in hs-err file; I much rather have a call stack without bells and whistles than an abridged one. Maybe we could, in hs-err printing, if we got secondary crashes during callstack dumping, repeat the step with all optional features (also name demangling) disabled? This could also be done in a separate RFE. We'll know when this happens, we can react then.
> 
> I absolutely agree - stability should be the primary concern. An incomplete hs-err file should be avoided at any cost. Doing an additional "catch and repeat without optional features" sounds interesting to get more safety. Would such a thing be easy to add? Yes, it might be better to do that in a separate RFE.

It is probably easy, but I also thing this would be better in a separate RFE. And we already have a timeout per reporting step since JDK-8166944, so that long-running steps don't spoil error reporting for everyone. We can just add a second call stack print step if the first one failed.

> 
> > Another small concern, we parse the Elf file while dumping the stack, right? I remember having a lot of problems on Solaris when dumping callstacks, because there parsing the elf file was really slow. And that delayed call stack printing by a lot, so much that the ErrorCrashTimeout often kicked in and spoiled the crash logs for us.
> 
> Yes, a pc for a frame is directly parsed when printing the corresponding frame. It takes some more time to do the additional parsing but not that much. These are the timestamps from a quick `-XX:CICrashAt=1` run with `-Xlog:dwarf=info` on my local machine on `Ubuntu 20.04` with a `fastdebug` build:
> 
> ```
> [1.862s][info][dwarf] Open DWARF file: /home/christian/Downloads/test/jdk-19/fastdebug/lib/server/libjvm.debuginfo
> [1.867s][info][dwarf] pc: 0x00007ffa35c8a9cf, offset: 0x007749cf, filename: c1_Compiler.cpp, line: 250
> [1.871s][info][dwarf] pc: 0x00007ffa35fbfb28, offset: 0x00aa9b28, filename: compileBroker.cpp, line: 2291
> [1.876s][info][dwarf] pc: 0x00007ffa35fc08e8, offset: 0x00aaa8e8, filename: compileBroker.cpp, line: 1966
> [1.881s][info][dwarf] pc: 0x00007ffa36e50cca, offset: 0x0193acca, filename: thread.cpp, line: 1297
> [1.890s][info][dwarf] pc: 0x00007ffa36e59010, offset: 0x01943010, filename: thread.cpp, line: 358
> [1.897s][info][dwarf] pc: 0x00007ffa36b3c524, offset: 0x01626524, filename: os_linux.cpp, line: 705
> ```
> 
> The parsing of a single pc takes a little less than 0.01s. Of course, this is not a great way to measure performance. It also highly depends on the source files themselves, the machine setup etc. Thus, this cannot be considered a valid performance test. But still, I think these numbers can give us some indication of the order of magnitude. Compared to the current `ErrorLogTimeout` default value of 2min this looks promising.

Okay, this looks reasonable. In our case, I remember having a very slow file system and an overloaded machine. But this would be solved also by just repeating call stack printing if the first attempt times out.

Cheers, and thanks for this patch!

..Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/7126



More information about the build-dev mailing list