RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v9]

Fri May 20 14:57:27 UTC 2022

On Fri, 20 May 2022 09:37:03 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> When printing the native stack trace on Linux (mostly done for hs_err files), it only prints the method with its parameters and a relative offset in the method:
>> 
>> Stack: [0x00007f6e01739000,0x00007f6e0183a000],  sp=0x00007f6e01838110,  free space=1020k
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x620d86]  Compilation::~Compilation()+0x64
>> V  [libjvm.so+0x624b92]  Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec
>> V  [libjvm.so+0x8303ef]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899
>> V  [libjvm.so+0x82f067]  CompileBroker::compiler_thread_loop()+0x3df
>> V  [libjvm.so+0x84f0d1]  CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69
>> V  [libjvm.so+0x1209329]  JavaThread::thread_main_inner()+0x15d
>> V  [libjvm.so+0x12091c9]  JavaThread::run()+0x167
>> V  [libjvm.so+0x1206ada]  Thread::call_run()+0x180
>> V  [libjvm.so+0x1012e55]  thread_native_entry(Thread*)+0x18f
>> 
>> This makes it sometimes difficult to see where exactly the methods were called from and sometimes almost impossible when there are multiple invocations of the same method within one method.
>> 
>> This patch improves this by providing source information (filename + line number) to the native stack traces on Linux similar to what's already done on Windows (see [JDK-8185712](https://bugs.openjdk.java.net/browse/JDK-8185712)):
>> 
>> Stack: [0x00007f34fca18000,0x00007f34fcb19000],  sp=0x00007f34fcb17110,  free space=1020k
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x620d86]  Compilation::~Compilation()+0x64  (c1_Compilation.cpp:607)
>> V  [libjvm.so+0x624b92]  Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec  (c1_Compiler.cpp:250)
>> V  [libjvm.so+0x8303ef]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899  (compileBroker.cpp:2291)
>> V  [libjvm.so+0x82f067]  CompileBroker::compiler_thread_loop()+0x3df  (compileBroker.cpp:1966)
>> V  [libjvm.so+0x84f0d1]  CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69  (compilerThread.cpp:59)
>> V  [libjvm.so+0x1209329]  JavaThread::thread_main_inner()+0x15d  (thread.cpp:1297)
>> V  [libjvm.so+0x12091c9]  JavaThread::run()+0x167  (thread.cpp:1280)
>> V  [libjvm.so+0x1206ada]  Thread::call_run()+0x180  (thread.cpp:358)
>> V  [libjvm.so+0x1012e55]  thread_native_entry(Thread*)+0x18f  (os_linux.cpp:705)
>> 
>> For Linux, we need to parse the debug symbols which are generated by GCC in DWARF - a standardized debugging format. This patch adds support for DWARF 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still experimental and not generated for HotSpot. However, newer GCC version may soon generate DWARF 5 by default in which case this parser either needs to be extended or the build of HotSpot configured to only emit DWARF 4. 
>> 
>> The code follows the parsing steps described in the official DWARF 4 spec: https://dwarfstd.org/doc/DWARF4.pdf
>> I added references to the corresponding sections throughout the code. However, I tried to explain the steps from the DWARF spec directly in the code (method names, comments etc.). This allows to follow the code without the need to actually deep dive into the spec. 
>> 
>> The comments at the `Dwarf` class in the `elf.hpp` file explain in more detail how a DWARF file is structured and how the parsing algorithm works to get to the filename and line number information. There are more class comments throughout the `elf.hpp` file about how different DWARF sections are structured and how the parsing algorithm needs to fetch the required information. Therefore, I will not repeat the exact workings of the algorithm here but refer to the code comments. I've tried to add as much information as possible to improve the readability.
>> 
>> Generally, I've tried to stay away from adding any assertions as this code is almost always executed when already processing a VM error. Instead, the DWARF parser aims to just exit gracefully and possibly omit source information for a stack frame instead of risking to stop writing the hs_err file when an assertion would have failed. To debug failures, `-Xlog:dwarf` can be used with `info`, `debug` or `trace` which provides logging messages throughout parsing. 
>> 
>> **Testing:**
>> Apart from manual testing, I've added two kinds of tests:
>> - A JTreg test: Spawns new VMs to let them crash in various ways. The test reads the created hs_err files to check if the DWARF parsing could correctly find the filename and line number. For normal HotSpot files, I could not check against hardcoded filenames and line numbers as they are subject to change (especially line number can quickly become different). I therefore just added some sanity checks in the form of "found a non-empty file" and "found a non-zero line number". On top of that, I added tests that let the VM crash in custom C files (which will not change). This enables an additional verification of hardcoded filenames and line numbers.
>> - Gtests: Directly calling the `get_source()` method which initiates DWARF parsing. Tested some special cases, for example, having a buffer that is not big enough to store the filename.
>> 
>> On top of that, there are also existing JTreg tests that call `-XX:NativeMemoryTracking=detail` which will print a native stack trace with the new source information. These tests were also run as part of the standard tier testing and can be considered as sanity tests for this implementation.
>> 
>> To make tests work in our infrastructure or if some other setups want to have debug symbols at different locations, I've added support for an additional  `_JVM_DWARF_PATH` environment variable. This variable can specify a path from which the DWARF symbol file should be read by the parser if the default locations do not contain debug symbols (required some `make` changes). This is similar to what's done on Windows with `_NT_SYMBOL_PATH`. The JTreg test, however, also works if there are no symbols available. In that case, the test just skips all the assertion checks for the filename and line number.
>> 
>> I haven't run any specific performance testing as this new code is mainly executed when an error will exit the VM and only if symbol files are available (which is normally not the case when using Java release builds as a user).
>> 
>> Special thanks to @tschatzl for giving me some pointers to start based on his knowledge from a DWARF 2 parser he once wrote in Pascal and for discussing approaches on how to retrieve the source information and to @erikj79 for providing help for the changes required for `make`!
>>  
>> Thanks,
>> Christian
>
> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits:
> 
>  - Merge branch 'master' into JDK-8242181
>  - Merge branch 'master' into JDK-8242181
>  - Fix build, add GCC flag gdwarf-4 to exclude DWARF 5, add assertions
>  - Apply remaining review comments from Thomas Stuefe
>  - Change load_dwarf_file with DwarfFilePath and DebugInfo
>  - Revert renaming on Windows
>  - Merge branch 'master' into JDK-8242181
>  - Updating some comments
>  - Cleanup loading dwarf file and add summary
>  - Review comments of first pass by Thomas except dwarf file loading
>  - ... and 51 more: https://git.openjdk.java.net/jdk/compare/69ff86a3...5e105465

Thanks Thomas for doing the testing!

-------------

PR: https://git.openjdk.java.net/jdk/pull/7126