RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v4]

Thomas Schatzl tschatzl at openjdk.java.net
Tue Feb 22 11:42:55 UTC 2022


On Tue, 8 Feb 2022 08:17:17 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> When printing the native stack trace on Linux (mostly done for hs_err files), it only prints the method with its parameters and a relative offset in the method:
>> 
>> Stack: [0x00007f6e01739000,0x00007f6e0183a000],  sp=0x00007f6e01838110,  free space=1020k
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x620d86]  Compilation::~Compilation()+0x64
>> V  [libjvm.so+0x624b92]  Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec
>> V  [libjvm.so+0x8303ef]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899
>> V  [libjvm.so+0x82f067]  CompileBroker::compiler_thread_loop()+0x3df
>> V  [libjvm.so+0x84f0d1]  CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69
>> V  [libjvm.so+0x1209329]  JavaThread::thread_main_inner()+0x15d
>> V  [libjvm.so+0x12091c9]  JavaThread::run()+0x167
>> V  [libjvm.so+0x1206ada]  Thread::call_run()+0x180
>> V  [libjvm.so+0x1012e55]  thread_native_entry(Thread*)+0x18f
>> 
>> This makes it sometimes difficult to see where exactly the methods were called from and sometimes almost impossible when there are multiple invocations of the same method within one method.
>> 
>> This patch improves this by providing source information (filename + line number) to the native stack traces on Linux similar to what's already done on Windows (see [JDK-8185712](https://bugs.openjdk.java.net/browse/JDK-8185712)):
>> 
>> Stack: [0x00007f34fca18000,0x00007f34fcb19000],  sp=0x00007f34fcb17110,  free space=1020k
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x620d86]  Compilation::~Compilation()+0x64  (c1_Compilation.cpp:607)
>> V  [libjvm.so+0x624b92]  Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec  (c1_Compiler.cpp:250)
>> V  [libjvm.so+0x8303ef]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899  (compileBroker.cpp:2291)
>> V  [libjvm.so+0x82f067]  CompileBroker::compiler_thread_loop()+0x3df  (compileBroker.cpp:1966)
>> V  [libjvm.so+0x84f0d1]  CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69  (compilerThread.cpp:59)
>> V  [libjvm.so+0x1209329]  JavaThread::thread_main_inner()+0x15d  (thread.cpp:1297)
>> V  [libjvm.so+0x12091c9]  JavaThread::run()+0x167  (thread.cpp:1280)
>> V  [libjvm.so+0x1206ada]  Thread::call_run()+0x180  (thread.cpp:358)
>> V  [libjvm.so+0x1012e55]  thread_native_entry(Thread*)+0x18f  (os_linux.cpp:705)
>> 
>> For Linux, we need to parse the debug symbols which are generated by GCC in DWARF - a standardized debugging format. This patch adds support for DWARF 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still experimental and not generated for HotSpot. However, newer GCC version may soon generate DWARF 5 by default in which case this parser either needs to be extended or the build of HotSpot configured to only emit DWARF 4. 
>> 
>> The code follows the parsing steps described in the official DWARF 4 spec: https://dwarfstd.org/doc/DWARF4.pdf
>> I added references to the corresponding sections throughout the code. However, I tried to explain the steps from the DWARF spec directly in the code (method names, comments etc.). This allows to follow the code without the need to actually deep dive into the spec. 
>> 
>> The comments at the `Dwarf` class in the `elf.hpp` file explain in more detail how a DWARF file is structured and how the parsing algorithm works to get to the filename and line number information. There are more class comments throughout the `elf.hpp` file about how different DWARF sections are structured and how the parsing algorithm needs to fetch the required information. Therefore, I will not repeat the exact workings of the algorithm here but refer to the code comments. I've tried to add as much information as possible to improve the readability.
>> 
>> Generally, I've tried to stay away from adding any assertions as this code is almost always executed when already processing a VM error. Instead, the DWARF parser aims to just exit gracefully and possibly omit source information for a stack frame instead of risking to stop writing the hs_err file when an assertion would have failed. To debug failures, `-Xlog:dwarf` can be used with `info`, `debug` or `trace` which provides logging messages throughout parsing. 
>> 
>> **Testing:**
>> Apart from manual testing, I've added two kinds of tests:
>> - A JTreg test: Spawns new VMs to let them crash in various ways. The test reads the created hs_err files to check if the DWARF parsing could correctly find the filename and line number. For normal HotSpot files, I could not check against hardcoded filenames and line numbers as they are subject to change (especially line number can quickly become different). I therefore just added some sanity checks in the form of "found a non-empty file" and "found a non-zero line number". On top of that, I added tests that let the VM crash in custom C files (which will not change). This enables an additional verification of hardcoded filenames and line numbers.
>> - Gtests: Directly calling the `get_source()` method which initiates DWARF parsing. Tested some special cases, for example, having a buffer that is not big enough to store the filename.
>> 
>> On top of that, there are also existing JTreg tests that call `-XX:NativeMemoryTracking=detail` which will print a native stack trace with the new source information. These tests were also run as part of the standard tier testing and can be considered as sanity tests for this implementation.
>> 
>> To make tests work in our infrastructure or if some other setups want to have debug symbols at different locations, I've added support for an additional  `_JVM_DWARF_PATH` environment variable. This variable can specify a path from which the DWARF symbol file should be read by the parser if the default locations do not contain debug symbols (required some `make` changes). This is similar to what's done on Windows with `_NT_SYMBOL_PATH`. The JTreg test, however, also works if there are no symbols available. In that case, the test just skips all the assertion checks for the filename and line number.
>> 
>> I haven't run any specific performance testing as this new code is mainly executed when an error will exit the VM and only if symbol files are available (which is normally not the case when using Java release builds as a user).
>> 
>> Special thanks to @tschatzl for giving me some pointers to start based on his knowledge from a DWARF 2 parser he once wrote in Pascal and for discussing approaches on how to retrieve the source information and to @erikj79 for providing help for the changes required for `make`!
>>  
>> Thanks,
>> Christian
>
> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Make dwarf tag NOT_PRODUCT

First pass, did not dive into details of the state machine yet.

src/hotspot/share/utilities/elfFile.cpp line 319:

> 317:     }
> 318:     log_develop_info(dwarf)("No separate .debuginfo file for library %s. It already contains the required DWARF sections.", _filepath);
> 319:     _dwarf_file = new (std::nothrow) DwarfFile(_filepath);

Would it be useful to explicitly bail out on a `nullptr` value here to avoid crashes below?

src/hotspot/share/utilities/elfFile.cpp line 357:

> 355:   }
> 356: 
> 357:   strcpy(debug_pathname, _filepath);

I'm always a bit uneasy using "raw" `strcpy` instead of `strncpy` and friends. The code seems to be correct though.

src/hotspot/share/utilities/elfFile.cpp line 358:

> 356: 
> 357:   strcpy(debug_pathname, _filepath);
> 358:   char* last_slash = strrchr(debug_pathname, '/');

It's probably no big issue hardcoding the forward slash here instead of using `os::file_separator()` in this method.

src/hotspot/share/utilities/elfFile.cpp line 407:

> 405: bool ElfFile::load_dwarf_file_from_env_path_folder(const char* env_path, const char* folder, const char* debug_filename, const uint32_t crc) {
> 406:   char* debug_pathname = NEW_RESOURCE_ARRAY(char, strlen(env_path) + strlen(folder) + strlen(debug_filename) + 2);
> 407:   strcpy(debug_pathname, env_path);

Similar to other resource allocations, this should bail out if the result is `nullptr`.

src/hotspot/share/utilities/elfFile.cpp line 566:

> 564: // http://sourceware.org/gdb/current/onlinedocs/gdb/Separate-Debug-Files.html#Separate-Debug-Files.
> 565: uint32_t ElfFile::gnu_debuglink_crc32(uint32_t crc, uint8_t* buf, const size_t len) {
> 566:   crc = ~crc & 0xffffffff;

The masks are unnecessary here but don't hurt. Feel free to keep.

src/hotspot/share/utilities/elfFile.cpp line 576:

> 574:   log_develop_info(dwarf)("Open DWARF file: %s", filepath);
> 575:   _dwarf_file = new (std::nothrow) DwarfFile(filepath);
> 576:   if (!_dwarf_file->is_valid_dwarf_file()) {

This should bail out if the `new` returned a `nullptr`.

src/hotspot/share/utilities/elfFile.cpp line 686:

> 684:   }
> 685: 
> 686:   // We must align to twice the address size.

Since alignment is based on address size? I.e. above, at the check whether addresses are correct, define address size and then multiply by 2 here.
This would also make the condition above look nicer, i.e. move the `[NOT_]LP64` outside of the condition.

src/hotspot/share/utilities/elfFile.cpp line 784:

> 782:   }
> 783: 
> 784:   if (!_reader.read_byte(&_header._address_size) || NOT_LP64(_header._address_size != 4)  LP64_ONLY( _header._address_size != 8)) {

Since this is the second time for the clause `|| NOT_LP64(_header._address_size != 4) LP64_ONLY( _header._address_size != 8)` maybe it is useful to make a constant out of the accepted address size somewhere instead of repeating this over and over.
It's value could even be something like `sizeof(intptr_t)` or so.

src/hotspot/share/utilities/elfFile.cpp line 814:

> 812:   log_develop_trace(dwarf)("Series of declarations [code, tag]:");
> 813:   AbbreviationDeclaration declaration;
> 814:   bool found_matching_declaration = false;

This variable is never used. Remove.

src/hotspot/share/utilities/elfFile.cpp line 944:

> 942: #else
> 943:       _reader.move_position(8);
> 944: #endif

Use `AddressSize` or similar here instead of the `#ifdef`.

src/hotspot/share/utilities/elfFile.cpp line 1026:

> 1024:         break;
> 1025:       } else {
> 1026:         if (!_reader.move_position(4)) {

Instead of hardcoding the `4` for lineptr/loclistptr/macptr/rangelistptr it would be nice to have a "DwarfOffset` constant of that value, since we only support 32 bit DWARF.

src/hotspot/share/utilities/elfFile.cpp line 1070:

> 1068:     // reason, GCC is currently using version 3 as specified in the DWARF 3 spec for the line number program even though GCC should
> 1069:     // be using version 4 for DWARF 4 as it emits DWARF 4 by default.
> 1070:     return false;

According to the specification (pg112):

> `version (uhalf)`
> A version number (see Appendix F). This number is specific to the line number information
> and is independent of the DWARF version number.

So this is just fine - actually things may break if the code accepted version 4 here assuming that there are breaking differences.
On the other hand Appendix F mentions that DWARF4 contains .debug_line information in version 4.

src/hotspot/share/utilities/elfFile.cpp line 1121:

> 1119:   // _debug_line_offset + 10 (=sizeof(_unit_length) + sizeof(_version) + sizeof(_header_length)) + _header_length.
> 1120:   _header._file_names_offset = _reader.get_position();
> 1121:   if (!_reader.set_position(shdr.sh_offset + _debug_line_offset + 10 + _header._header_length)) {

I would prefer a constant for this magic `10`. Thank you for the documentation.

src/hotspot/share/utilities/elfFile.hpp line 211:

> 209: 
> 210:   // Load the DWARF file (.debuginfo) that belongs to this file.
> 211:   bool load_dwarf_file();

It would be nice to summarize from which places this methods tries to load the debug info to prevent the need for digging for it in the method implementation.

src/hotspot/share/utilities/elfFile.hpp line 300:

> 298:  *  - debug: Prints the results of the steps (1) - (4) together with the generated line information matrix.
> 299:  *  - trace: Full logging information for intermediate states/results when parsing the DWARF file.
> 300:  */

Maybe add a comment that log output is only supported in non-product builds and the reason.

-------------

Changes requested by tschatzl (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7126



More information about the build-dev mailing list