RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4]

Wed Nov 16 12:29:17 UTC 2022

On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information:
>> 
>> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information.
>> 
>> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. 
>> 
>> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately.
>> 
>> Thanks,
>> Christian
>
> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
> 
>  - Always read full filename and strip prefix path and only then cut filename to fit output buffer
>  - Merge branch 'master' into JDK-8293422
>  - Merge branch 'master' into JDK-8293422
>  - Review comments from Thomas
>  - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+
>  - 8293422: DWARF emitted by Clang cannot be parsed

Changes requested by stuefe (Reviewer).

src/hotspot/share/utilities/elfFile.cpp line 1605:

> 1603:   uint32_t current_index = 1; // file_names start at index 1
> 1604:   const size_t dwarf_filename_len = 1024;
> 1605:   char dwarf_filename[dwarf_filename_len]; // Store the filename read from DWARF which is then copied to 'filename'.

Putting such a large array on the stack is a bit borderline. Especially in error reporting, where you may build up stack repeatedly via secondary crash handling. I realize though that no good alternatives exist. C-heap may be corrupted, ResourceArea is also out of the question. Could we get away with using filename directly?

src/hotspot/share/utilities/elfFile.cpp line 1636:

> 1634:   char* last_slash = strrchr(filename, *os::file_separator());
> 1635:   if (last_slash != nullptr) {
> 1636:     uint16_t index_after_slash = (uint16_t)(last_slash + 1 - filename);

Why uint16_t? 

We have `pointer_delta()` for that btw if you want to be super correct. See globalDefinitions.hpp

src/hotspot/share/utilities/elfFile.cpp line 1638:

> 1636:     uint16_t index_after_slash = (uint16_t)(last_slash + 1 - filename);
> 1637:     // Copy filename to beginning of buffer.
> 1638:     int bytes_written = jio_snprintf(filename, filename_len - index_after_slash, "%s", filename + index_after_slash);

I don't think this is guaranteed to work since the memory areas you move may interleave. You should copy char-wise, or use `memmove(3)`.

src/hotspot/share/utilities/elfFile.cpp line 1651:

> 1649:   int bytes_written = jio_snprintf(dst, count, "%s", src);
> 1650:   // Add null terminator.
> 1651:   dst[count - 1] = '\0';

Does it make sense to return a truncated file name up to the caller of `DwarfFile::LineNumberProgram::get_filename_from_header()`? Will this not just confuse him? I think it makes more sense to cleanly handle truncation, and e.g. skip file parsing for dwarf files with too long names.

src/hotspot/share/utilities/elfFile.hpp line 865:

> 863:     bool get_filename_from_header(uint32_t file_index, char* filename, size_t filename_len);
> 864:     static void strip_path_prefix(char* filename, const size_t filename_len);
> 865:     static void copy_dwarf_filename_to_filename(char* src, size_t src_len, char* dst, size_t dst_len);

Stupid question, do these have to be exposed? Or could they be just static helpers in elfFile.cpp?

-------------

PR: https://git.openjdk.org/jdk/pull/10287