RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v5]

Wed Mar 30 06:41:36 UTC 2022

On Mon, 28 Mar 2022 13:14:50 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> this is impressive work. It's a big change, and I had a look at part of it. I'll continue tomorrow.

Thanks a lot Thomas for your careful review! I'm in the process of working through your comments and will come back with an update today or later this week.

> In general, I'm concerned with the use of both UL and ResourceArea in this code. I know the use of UL has been discussed, but still. 

I agree that it is problematic but I think it would be good to keep some logging around when later coming back to the parser code (and that's the only reason I think that you ever want to turn these logs on). I can currently think of two options:

- Leave UL in and just guard it with an additional new develop flag to exclude the logs from unfiltered UL logging. This would allow us to kinda accept the risks for debugging purposes. That's not really a good design though but we could keep the log levels with their time stamps.
- Replace all UL calls with `tty` and also guard them with a new develop flag and play around with `Verbose` and `WizardMode` to keep the different log levels. That's not great either but I think it's safer to use and we only want the logs on rare occasions anyways - so it might be acceptable to use these verbose flags even though we should generally get away from them.

> Use of RA will prevent us from getting useful callstacks if we crash and Thread::current is NULL or invalid. I'd feel better if we were to consistently rely on an outside scratch buffer (like we usually do in error reporting). Even raw ::malloc would be better IMHO.

The idea of a scratch buffer sounds good. I'll check if I can replace all the `NEW_RESOURCE_ARRAY` usages with it.

> Another concern was safety, since this is a potential attack vector with manipulated Dwarf files, if someone manages to provove a crash. Maybe far fetched, but still. Would be good to get SonarCloud readings for this code e.g.

I was also concerned about that and I'm very thankful that you've spotted some issues already! I think minimizing the risk of a potential attack should be a top priority. We should definitely add some more checks. What do you think about the usage of `_JVM_DWARF_PATH` to load a DWARF file? I'm not sure how safe it is. I originally had it enabled for debug builds only.

> We see test errors on Linux ppcle and x64 in gtests:

Could you try running it with `-Xlog:dwarf=info/debug` in order to find out why it failed? It might not have found the symbols. Is the JTreg test `TestDwarf.java` working? But there is now another problem that since using GCC 11.2 (change done for Oracle builds with [JDK-8283057](https://bugs.openjdk.java.net/browse/JDK-8283057)), it emits unsupported DWARF 5 for some DWARF sections, at least on my machine, which is unfortunate. Maybe that's also the reason you see the failures if you use GCC 11.2. Maybe we can mitigate this problem by forcing GCC to use DWARF 4 for now. Could that be done by using the `-gdwarf` GCC flag? @erikj79

> We also see Problems in runtime/ErrorHandling and in jfr/jvm/TestDumpOnCrash. Mostly, these tests now have much longer runtimes (about factor 2). With TestDumpOnCrash, both the error file writer and the test itself timeouted on some of our slower machines.

Are these timeouts on ppcle and x64? We could also try to add `-Xlog:dwarf=info/debug` to the runs to get some rough idea of the time required to parse DWARF. I'll have a look at the these tests.

Thanks,
Christian

-------------

PR: https://git.openjdk.java.net/jdk/pull/7126